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(57) ABSTRACT 

A postfiltering process for improving the appearance of a 
video image includes motion compensated temporal filtering 
and spatial adaptive filtering. For each target pixel being 
filtered, the temporal filtering uses multiple motion vectors 
and one or more pixel values for a prior frame to determine 
one of more reference values for the target filter. In one 
embodiment, a weighted average of multiple motion vectors 
for blocks near or containing the target pixel value provides 
a filter vector that points to a pixel value in the prior frame. 
This pixel value is a reference value for the target pixel value 
and is combined with the target pixel value in a filter 
operation. Alternatively, multiple motion vectors for blocks 
near or containing the target pixel value point to pixel values 
in the prior frame that are averaged to determine a reference 
value for the target pixel value. In each alternative, the 
weighting for the average is selected according to the 
position of the target pixel value. The spatial filtering 
determines a dynamic range of pixel values in a smaller 
block containing the target pixel value and a dynamic range 
of pixel values in a larger block containing the target pixel 
value. The two dynamic ranges suggest the image context of 
the target pixel, and an appropriate spatial filter for the target 
pixel is selected according to the suggested context. 

31 Claims, 5 Drawing Sheets 
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VIDEO POSTFILTERING WITH MOTION- values from the current array with pixel values from one of 

COMPENSATED TEMPORAL FILTERING more arrays representing prior or subsequent frames. 

AND/OR SPATIAL-ADAPTIVE FILTERING Typically, temporal filtering combines a pixel value in the 

current array with pixel values in the same relative position 

REFERENCE TO MICROFICHE APPENDIX 5 in an array representing a prior frame under the assumption 

The present specification comprises a microfiche appen- lh V. lhe a I ea s ™ ilar If noise ° r a 

dix. The total number of microfiche sheets in the microfiche al * fac j affe f » P lxel value m ,h c e current ^ * ut 001 ' he 

appendix is one. The total number of frames in the micro- re ' ated ^ values m the P" or &ames - ,em F oral e .f odlD f 

fiche appendix is 49 reduces the prominence of the noise or coding artifacts. A 

10 problem with temporal encoding arises from motion in the 

COPYRIGHT NOTICE video image where the content of the image in one frame 

shifts in the next frame so that temporal filtering combines 
A portion of the disclosure of this patent document pixels in the currem frame ^ vi^y dissimilar pixels in 
contains material that is subject to copyright protection. The prior fr ame s. When this occurs, the contribution of the 
copyright owner has no objection to the facsimile reproduc- 15 dissim ii ar p^ creat cs a ghost of a prior frame in the 
tion by anyone of the patent document or the patent current frame. Accordingly, temporal filtering can introduce 
disclosure, as it appears in the Patent and Trademark Office undesired artifacts in a video image, 
patent files or records, but otherwise reserves all copyright n t( * u . . . . . t{ . 
ri hts whatsoever Postfiltering processes are sought that better remove cod- 
rig s w a soever. ^ artifacts and noise while preserving image features and 
BACKGROUND 20 not introducing further degradations. 

1. Field of the Invention SUMMARY 

This invention relates to systems for decoding video In accordance with the invention, a video postfiltering 

images and particularly to methods for improving decoded process mcludes motion com p eDS ated temporal filtering 

video image quality by removing coding artifacts and noise. 25 and/or adaptive filtering . ^ molion compensated 

2. Description of Related Art temporal filtering operates on each target pixel value in an 
"Coding artifacts" are visible degradations in image qual- array representing a current frame of a video image and 

ity that may appear as a result of encoding and then decoding combines each target pixel value with one or more pixel 

a video image using a video compression method such as 3Q values from an array representing a prior frame. The pixel 

employed for the MPEG-1, MPEG-2, H.261, or H.263 values from the prior frame alone or in combinations are 

standard. For example, video encoding for each of the sometimes referred to herein as reference values. The ref- 

MPEG-l, MPEG-2, H.261, and H.263 standards employs erence values for a target pixel in the current array are 

some combination of: partitioning frames of a video image selected according to and depending on the values of a 

into blocks; determining motion vectors for motion com- 35 motion vector for a block containing the target pixel value 

pensation of the blocks; and motion vectors for neighboring blocks. Using the 

performing a frequency transform (e.g., a discrete cosine motion vectors of neighboring blocks in the selection of 

transform) on each block or motion-difference block; and reference values reduces ghosting when compared to tem- 

quantizing the resultant transform coefficients. Upon P°ral filtering without motion vectors or using only the 

decoding, common coding artifacts in a video image include ^ motion vector for the block containing the target pixel, 

blockiness that results from discontinuity of block-based In one embodiment of the invention, a vector (sometimes 

motion compensation and inverse frequency transforms at referred to herein as a filter vector) for a target pixel is 

block boundaries and "mosquito" noise surrounding objects determined from a weighted average of the motion vectors 

in the video image as a result of quantization errors changing for the block containing the target pixel and the neighboring 

transform coefficients. Sources other than encoding and 45 blocks closest to the target pixel. The weighting factors used 

decoding can also introduce noise that degrades image in determining the filter vector for the target pixel depend on 

quality. For example, transmission errors or noise in the the position of the target pixel within a block. A pixel value 

system recording a video image can create random noise in for the target pixel is then filtered or combined with one or 

the video image. more reference values that correspond to an area of the prior 

Postfiltering of a video image processes the video image 50 frame identified by the filter vector, 

to improve image quality by removing coding artifacts and An alternative embodiment of temporal filtering com- 

noise. For example, spatial postfiltering can smooth the bines each target pixel value with pixel values from a prior 

discontinuity at block boundaries and reduce the prominence frame that are in areas identified by the motion vectors for 

of noise. Such spatial filtering operates on an array of pixel the block containing the target pixel value and neighboring 

values representing a frame in the video image and modifies 55 blocks. The pixel values from the prior frame may be 

at least some pixel values based on neighboring pixel values. combined in a weighted average using weighting factors 

Spatial filtering can be applied uniformly or selectively to selected according to the position of the target pixel value 

specific regions in a frame. For example, selective spatial within a block. 

filtering at a block edge (known locations within a frame) The adaptive spatial filter selects a filter operation for a 
smoothes image contrast to reduce blockiness. However, 60 target pixel according to the level of coding artifacts and the 
spatial filtering can undesirably make edges and textures of presence of important features. The level of coding artifacts 
objects in the image look fuzzy or indistinct and selective depends on how well the pixel values are coded as indicated 
spatial filtering can cause "flashing" where the clarity of the by the quantization factor. The dynamic range of the small- 
edges of an object change as the object moves through areas est coding unit, a 8x8 block in most of the standardized 
filtered differently. 65 encoding processes, is used to estimate the amount of coding 
Temporal filtering operates on a current array of pixel noise in the block. A large dynamic range usually indicates 
values representing a current frame and combines pixel more noise. To reduce blurring of image features, a second 
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dynamic range around the target pixel is computed and used application of the invention, decoding system 100 decodes 

in two ways. The second dynamic range indicates the shape a video signal complying with the H.261 standard for video 

of the filter required to avoid mixing pixels from different conferencing. Alternative applications will be apparent in 

features together. The second dynamic range also indicates v i ew of this disclosure. 

the appropriate strength of the filter. When the second 5 In accordance with the H.261 standard, three arrays of 

dynamic range is close to the first dynamic range, the target P«el values, a Y array, a U array, and a V array, represent 

pixel is on or near image features, and a weak filter is used. eacn frame of lhe v i de o image and are respectively associ- 

When the second dynamic range is smaller than a large first ated with luma Y and chroma U and V of associated pixels 

dynamic range, the target pixel is likely to be noise around m tne frame. In the exemplary embodiment of the invention 
the edges and a strong filter is used. Other combinations of 10 shown in FIG. 1, only the Y arrays of frames are postfiltered 

the sizes of the dynamic ranges result in the use of other sulce tne Y arrays have the greatest influence on the appear- 

filters. ance of the video image. The Y array for a frame contains 

Although the temporal filtering and spatial filtering are 288 rows and 352 columns of P ixel values where each P* el 

used in combination to provide the best image quality, either value mdlcates the luma f ° r a pixel in a standard frame size, 

may be used alone in particular embodiments of the inven- « wmch 15 288x352 P ixels for the H.261 CIF picture format. 

^ on The U and V arrays each contain 144 rows and 176 columns 

of pixel values where each pixel value indicates U or V 

BRIEF DESCRIPTION OF THE DRAWINGS chroma for four pixels in the associated frame. The H.261 

standard partitions each frame into 16xl6-pixel areas where 

FIG. 1 shows a video decoder implementing postfiltering 2Q each Mea is represented by a macroblock of pixel values, 

in accordance with an embodiment of the invention. Eacn maC roblock includes one 16x16 block (or four 8x8 

FIG. 2 shows a flow diagram for a motion compensated blocks) from the Y array, one 8x8 block from the U array, 

temporal filtering process in accordance with an embodi- and one 8x8 block from the V array. During encoding of 

ment of the invention. each macroblock in a current frame, an encoder uses a match 

FIG. 3 illustrates motion vectors for a portion of a frame 25 criterion such as mean square error, mean quadratic error, or 

that is divided into blocks. mean absolute error to search for a 16x1 6-pixel area in a 

FIG. 4 shows a flow diagram for a motion compensated P rior frame mat is visuall y similar to the 16xl6-pixel area 

temporal filtering process in accordance with another associated with the macroblock. A motion vector for the 

embodiment of the invention. macroblock indicates an ofifcet from the position of the 

or .... , ( • t t * e *u « 30 similar area in the prior frame to the position of the asso- 

FIG. 5 illustrates pixel values from a pnor frame that are JU . . , . F . - - . i_i i • 

• • i . r» r r i r i ■ ciated area in the current frame. Each macroblock is then 

combined to form a reference value for a target pixel in a . # . . - . * * * j- u »u »_ 

current frame interceded or intracoded depending on whether the search 

finds a good match (i.e., similar) area in the prior frame. 

FIG. 6 shows a spatial adaptive filter in accordance with Interceding determines a difference block that is the differ- 

an embodiment of the invention. 35 ence between a block representing the area in the current 

Use of the same reference symbols in different figures frame and a block in the prior frame indicated by the motion 

indicates similar or identical items. vector, breaks the difference block into 8x8 difference 

nnTATT nn nccrDiDTinM ™ tot: blocks, and performs a discrete cosine transform (DCT) on 

^S^SSSSSS^ each of the 8x8 difference blocks - lDtracoding performs a 

PREFERRED EMBODIMENTS ^ discrete CQsine transform on the gx g blocks of pixel values 

In accordance with an aspect of the invention, a video rather than on the difference blocks. Following intercoding 

postfilter employs motion compensated temporal filtering or intracoding, the transform coefficients are quantized and 

and spatial adaptive filtering to improve image quality and then transmitted with motion vectors for the interceded 

remove coding artifacts. The temporal filtering uses motion blocks in an encoded bit stream representing the video 

vectors from multiple blocks to determine a reference value 45 image. 

that is combined with the target pixel value being filtered. Decoding system 100 includes a decoder 110, a block 

The reference value selected using multiple motion vectors boundary filter (BBF) 120, a motion compensated temporal 

better matches the target pixel value because the combina- filter 130, and a spatial adaptive filter 140 that process an 

tion of motion vectors can better approximate motion of incoming bit stream complying with the H.261 standard, 

individual pixels than can a motion vector that indicates 50 Decoder 110 is a conventional decoder that decodes the bit 

average motion of an entire block of pixel values. The spatial stream to generate arrays of pixel values representing 

adaptive filtering uses the dynamic ranges of pixel values in decoded frames that form a video image. In the decoding, 

blocks of different sizes to determine the visual context of decoder 110 identifies a quantization factor MQUANT from 

the target pixel, and selects a filter for the target pixel a quantization factor list 115, dequantizes transform 

according to the determined visual context. Such postfilter- 55 coefficients, performs an inverse discrete cosine transforma- 

ing processes improve video image quality and are appli- tion (I DCT) on 8x8 blocks of dequantized transformation 

cable to any video image. However, the postfiltering pro- coefficients, and for intercoded blocks, sums the resulting 

cesses are particularly suited for postfiltering a video image difference blocks with the similar blocks that the motion 

decoded in accordance with a video standard such as the vectors identify in the prior frame. BBF 120, motion com- 

well-known MPEG-1, MPEG-2, H.261, or H.263 video 60 pensated temporal filter 130, and spatial adaptive filter 140 

standard. postfilter the decoded video image from decoder 110 to 

FIG. 1 shows a block diagram of a decoding system 100 reduce noise and coding artifacts and improve image quality, 

in accordance with an exemplary embodiment of the inven- Block boundary filter 120 reduces blockiness resulting 

tion. Decoding system 100 may be implemented in software from discontinuity at the boundaries of 8x8 blocks that were 

executed by a general purpose computer or in specialized 65 subject to independent DCTs and changes the pixel values at 

hardware designed to implement the specific functions of the boundaries of the 8x8 blocks to smooth transitions 

system 100 as described herein. As a specific example of an across the block boundaries. Specifically, if the columns of 
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an array of pixel values are numbered from 0 to 351, BBF invention, the filter vector is derived using a weighted 
120 changes pixel values in columns 8n and 8n+7 for average of the motion vectors for the macroblock containing 
0^n^43; and if the rows are numbered from 0 to 287, BBF the target pixel and neighboring macroblocks. For example, 
120 changes pixel values in rows 8m and 8m+7 for Equation 1 defines a filter vector FVlj for target pixel Pij in 
0^m^35. In the exemplary embodiment of the invention, 5 m e exemplary embodiment of the invention. 
BBF 120 adapts or changes according to the quantization 

step size MQUANT for the block containing the pixel being FW/«round (Ay •MVA+mj •MVB*cij*MVC+Dii *mvd) Equation l 

filtered. For the H.261 standard, the encoded bit stream 

indicates the quantization step size for each encoded block. In Equation 1, Aij, Bij, Cij, and Dij are weighting factors, 
Table 1 indicates the coefficients for a five-tap horizontal 3Q MVA, MVB, MVC, and MVD are the motion vectors 
filter for use on a target pixel in a column j and a three-tap selected in step 230, and round( ) is a function that rounds 
vertical filter for use on a target pixel in a row i. its argument to the nearest integer. 

To illustrate the selection of motion vectors, FIG. 3 shows 
TABLE 1 a portion of a current frame represented by six 16xl6-pixel 

areas 310, 320, 330, 340, 350, and 360. Macroblocks 
representing areas 310, 320, 330, 340, 350, and 360 have 
respective motion vectors MV1, MV2, MV3, MV4, MV5, 
and MV6 that identify visually similar 16x1 6-pixel areas in 
the prior frame. For a specific target pixel, the motion vector 
for the macroblock representing the target pixel may not 
indicate a similar pixel in the prior frame if motion of an 
object including target pixel differs from the average motion 
for the block. The motion of the target pixel may be more 
like the average motion of a neighboring block rather than 
the block containing the target pixel. Accordingly, in the 
exemplary embodiment of the invention, motion vectors 
The filter coefficients for each filter in Table 1 sum to 32 so MVA > MVB, MVC, and MVD are respectively the motion 
that the result of a filter operation on a target pixel value Pij vec,or { ° T bIock coat *™*& 'fig* pixel, the motion 
is the sum of products of pixel values and filter coefficients v f! or for lhe neares ' »*»*>™W bl ° ck «° "E»" °' }f 
right shifted by 5-bits (i.e., divided by 32.) 30 ° f * c ^ P* e1 ' *e motion vector for the nearest neigb- 

BBF 120 provides pixel values of a block-boundary- bonD S block above or below the target pixel, and the motion 
filtered frame to motion compensated temporal filter 130. ve , ctor for ,be °f ar f «»g?*o™8 bk > ck ° n a *»gM»l 
Motion compensated temporal filter 130 includes a dynamic relativ ' to * e block <^«f™"g , tar g et P'* el - F ° r 

noise reduction filter 132 that combines pixel values for the exa |?P Ie ' * be ° ^ P* el va ue P, J ■ m a ° W-nght 
current frame with reference values derived from pixel 35 <P jadrant311 of block 310 as i^uated « FIG. 3 selected 
valuesforapriorframel38.Areferencevaluegeneratorl36 moUon vec,ors MVB > MVC ! ' £ ™°SST I,0 5 

determines the reference value using a list 134 of motion J, «»P«*vely motl l OD sectors MV1, MV2, MV3, and 
vectors for the current frame and the pixel values for prior ! If target pixel value Pij were m lower-nght quadrant 
frame 138 312, ^ seated motion vectors MVA, MVB, MVC, and 

FIG. 2 illustrates a flow diagram of a temporal filtering 40 ™Y? ^L^f tively . be m0tiot l, ,~ V1, 
process 200 that filter 130 implements. An initial step 210 in MV5 ' and MV6 ' If a moU ° n vect ° r MVC or MVD 

process 200 completes the motion vector list 134 for the 7°"" correspond lo a block beyond the edge of 

current frame. In the exemplary embodiment, decoder 110 ? be f rame ' .» blo f m frame but closest to lbe desired 
write to list 134 the motion vectors as decoded from the provides that motion vector. 

incoming bit stream. Alternatively, temporal filter 130 can 45 A In Eq uatlon L weighting factors Aij, Bij, Cij, and Dij 
determine the motion vectors from the pixel values for the de P end , °n mdrces i and j which respectively indicate the 
decoded current and prior frames, but determining motion veTi \ cal and b ° nzontal positions of the target pixel Py in a 
vectors increases filter complexity. Additionally, the ^rant <>f a block. Indices i and j range from 1 to 8 for an 
decoded motion vectors from decoder 110 typically provide 8 f Quadrant contammg target pixel Pij and have minimum 
better temporal filtering because an encoder selected 50 values near me c«nter of 16x16 block. Equations 2 give the 
encoded motion vectors using image data before compres- se)f c ° Qtnbu «° n «*« h »«W factor ^ tbe "J8fo** n "g bb °r 
sion. When the current frame is fully decoded, each mac- weighting factor By, the upper/lower neighbor we lg hting 
roblock not having a decoded motion vector (e.g., each ? c,or Cl ->' ^ the neighbor weighting factor Dij 

intracoded macroblock) is assigned a motion vector of for an exem P lar y embodiment of the invention, 
length zero or an illegal motion vector. Subsequent steps in 55 Ay=(i6 5-Q"(i6 5-/)/256 

process 200 skip intracoded microblocks and replace each 

illegal motion vector with a motion vector for a neighboring Bij-(i6.5-i)'(i-O.S)!256 
block. 

Once list 134 is complete, reference value generator 136 Cy-(i-o.5)'(t6.5-/)/256 
in step 220 selects a target pixel in the current frame and 60 d.M«-o.5)'0-o.s)/256 Equation. 2 

identifies a macroblock containing tbe target pixel. A step 

230 selects motion vectors for the target pixel, and a step 240 The weighting factors for the possible target pixel locations 
uses the selected motion vectors to determine a filter vector in an 8x8 quadrant are selected according to the likelihood 
for the target pixel. The filter vector indicates an oflset to the that the motion of an object including target pixel Pij is 
position of tbe target pixel from the position of a pixel in the 65 similar to the motion vector associated with the weighting 
prior frame that will be combined with the target pixel in a factor. For example, if target vector Pij is near the center of 
filter operation. In accordance with an aspect of the block 310, the motion of target pixel Pij is likely to be 
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similar to motion vector MV1. Accordingly, weighting fac- 
tor Aij dominates the other weighting factors when indices 
i and j indicate a target point near the center of block 310 
(i.e., if i and j are both at or near 1.) As index j or i increases, 
target pixel Pij nears the boundary of block 320 or 330, and 
coefficient Bij or Cij increases the contribution of motion 
vector MV2 or MV3. 

In process 200 (FIG. 2), a step 250 uses the filter vector 
to identify a reference value that is inserted into a reference 
array. The inserted value is inserted at the position corre- 
sponding to the target pixel but is from a position offset from 
the position of the target pixel by the amount indicated by 
the filter vector. Steps 220 to 250 are repeated for each pixel 
in the current frame until the reference array is complete in 
step 260. A filtering step 270 combines pixel values from the 
array representing the current decoded frame with reference 
values from the reference array. Equation 3 indicates the 
form of a filtering that combines target pixel value Pij for the 
current frame and a reference value Rij from the reference 
array to generate an output pixel value Oij. 



Oij=Pij-F{Pij-Rif) 



Equation 3 



Filter function F(Pij-Rij) is a function of a difference A 
between decoded pixel value Pij and the associated refer- 
ence value Rij. For a large difference A, filter function F(A) 
is zero so that no temporal filtering is performed if reference 
value Rij is not a good match for decoded pixel value Pij. 
The filter function F(A) may further depend on coding 
parameters such as the macroblock quantization step size Q. 
Table 2 illustrates a filter function F(A,Q) suitable for the 
exemplary embodiment of the invention. 
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select three motion vectors (the motion vectors for the block 
containing the target pixel, the nearest neighboring block to 
the left or right of the target pixel, and the nearest neigh- 
boring block above or below the target pixel) or nine motion 
vectors (the motion vectors for the block containing the 
target pixel, the eight nearest neighboring blocks.) Further, 
determining the filter vector in step 240 can use a variety of 
different weighting factors or functions of the selected 
motion vectors and is not limited to a weighted average or 
particular weighting factors. Additionally, each reference 
value can be combined with a target pixel in a filtering 
operation immediately after step 240 without ever generat- 
ing the reference array. Further, a variety of filter functions 
not limited to the form of Equation 3 described above may 
be employed. For example, filters can combine each target 
pixel with more that one reference value from the reference 
array. 

FIG. 4 illustrates an alternative temporal filtering process 
400 in accordance with the invention. Process 400 begins 
with the same steps 210, 220, and 230 described above in 
reference to FIG. 2. Step 210 completes the list 134 of 
motion vectors for macroblocks representing the current 
frame. Step 220 selects a target pixels in the current decoded 
frame, and step 230 selects a set of motion vectors from list 
134 for the target pixel. The selected motion vectors include, 
25 for example, the motion vector for the block containing the 
target pixel, the motion vector for the nearest neighboring 
block to the left or right of the target pixel, the motion vector 
for the nearest neighboring block above or below the target 
pixel, and the motion vector for the nearest neighboring 
block at a diagonal with the block containing the target pixel. 
For example, referring to target pixel Pij in FIG. 3, step 230 
selects motion vectors MV1, MV2, MV3, and MV4. 
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TABLE 2 



Filter Function F (A.Q) of Difference A and Quantization Steo O 
A\Q 12 3 45 6 78 9 >9 



0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0.29 


0.34 


0.39 


0.44 


0.49 


0.54 


0.59 


0.64 


0.69 


0.74 


2 


0.44 


0.52 


0.59 


0.67 


0.74 


0.81 


0.89 


0.96 


1.04 


1.11 


3 


0.87 


1.01 


1.16 


1.30 


1.45 


1.59 


1.74 


1.88 


2.03 


2.17 


4 


1.16 


1.36 


1.55 


1.75 


1.94 


114 


2.33 


2.53 


2.72 


292 


5 


1.34 


1.57 


1.79 


2.01 


2.24 


2.46 


2.69 


2.91 


3.14 


3.36 


6 


1.53 


1.79 


2.04 


2.30 


2.56 


2.81 


3.07 


3.33 


3.58 


3.84 


7 


1.73 


2.02 


2.30 


2.59 


2.88 


3.17 


3.46 


3.75 


4.04 


4.32 


8 


1.93 


2.25 


2.57 


2.90 


3.22 


3.54 


3.86 


4.19 


4.51 


4.83 


9 


2.11 


2.47 


2.82 


3.17 


3.52 


3.88 


4.23 


4.58 


4.94 


5.29 


10 


2.32 


2.71 


3.10 


3.49 


3.88 


4.27 


4.65 


5.04 


5.43 


5.82 


11 


2.08 


2.43 


2.78 


3.13 


3.48 


3.83 


4.17 


4.52 


4.87 


5.22 


12 


1.89 


2.21 


2.52 


2.84 


3.16 


3.47 


3.79 


4.10 


4.42 


4.74 


13 


1,69 


1.97 


2.25 


153 


2.81 


3.10 


3.38 


3:66 


3.94 


4.22 


14 


1.48 


1.72 


1.97 


2.22 


2.47 


271 


2.96 


3.21 


3.45 


3.70 


15 


1.30 


1.52 


1.74 


1.95 


2.17 


139 


2.61 


2.82 


3.04 


3.26 


16 


1.11 


1.30 


1.49 


1.67 


1.86 


205 


2.23 


2.42 


2.61 


2.80 


17 


0.93 


1.09 


1.24 


1.40 


1.55 


1.71 


1.86 


2.02 


118 


233 


18 


0.74 


0.87 


0.99 


1.12 


1.24 


1.37 


1.49 


1.62 


1.74 


1.87 


19 


0.56 


0.65 


0.75 


0.84 


0.94 


1.03 


1.12 


1.22 


1.31 


1.41 


20 


0.37 


0.44 


0.50 


0.56 


0.63 


0.69 


0.75 


0.82 


0.88 


0.94 


21 


0.19 


0.22 


0.25 


0.29 


0.32 


0.35 


0.38 


0.42 


0.45 


0.48 


>21 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



The filter function coefficients and reference values can be 
stored using double precision, e.g. 16-bits of precision 
where 8 bits are normally used for pixel values to reduce 
rounding errors. 

The exemplary embodiment of the temporal filtering 
process illustrated in FIG. 2 and described above may be 
varied in a number of ways in keeping with the invention. 
For example, step 230 may select more or fewer that four 
motion vectors per target pixel. In particular, step 230 could 



With one end at the target pixel, each of the selected 
motion vectors identifies a pixel value in the array repre- 
senting the prior frame. FIG. 5 shows four pixels PI, P2, P3, 
and P4 which respective motion vectors MV1, MV2, MV3, 
and MV4 for target pixel Pij. The pixel values that the 
selected motion vectors identify in the prior frame are 
combined with the target pixel value in a filter operation. For 
process 400, the filter operation combines the target pixel 
value with a reference value that is a weighted average of the 
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pixel values that the selected motion vectors identify. Step texture. Table 3 shows combinations of the dynamic ranges, 

440 selects the factors for the weighted average, and step the image content suggested by each combination, and the 

450 determines the weighted average that will be a reference appropriate level of filtering for each combination, 
value. For example, Equation 4 defines a reference value Rij 

for a target pixel Pij. 5 TABLE 3 



RijMij*R\+Bij*PB+Cij*PC+Dij m PD Equation 4 Fi]ter 

In Equation 4, Aij, Bij, Cij, and Dij are the factors for the DRg u smaU DRg h modcratc DR8 - a klgc 

weighted average and may be, for example, as defined in 



Equation 2 above. PA, PB, PC, and PD are pixel values for 10 DR3 * smaI1 Weak FiIter: Medium Filter: Strong Filter: 

the prior frame that motion vectors MVA, M VB, MBC, and ^ ct * Ta f | ^ ^ Ktaly »™ 

r . . . * noise or detail on detail near an edge 

MVD identify for target pixel value Pij. Motion vectors DR3 & \fe ry weak Weak Filter: Medium filter: 

MVA, MVB, MBC, and MVD are the motion vectors moderate Filter: Target likely Target could be 

respectively for the block containing the target pixel value, image texture noise or detail 

the nearest neighboring block to the left or right of the target 15 DR3 * brgc ^ Wcak Very Weak Fflter: ^^f^ 
pixel, the nearest neighboring block above or below the ' edge 

target pixel, and the nearest neighboring block at a diagonal 

with the block containing the target pixel. For target pixel Pij ^ , , 

in quadrant 311 as illustrated in FIG. 3, the selected motion ^ lar S est chan § e betwe L en * d J acent P"«l values similarly 

vectors MVA, MVB, MBC, and MVD are respectively 20 measures co ? tem ^ bul determining .the largest change 

motion vectors MV1, MV2, MBA, and MV4, and pixel » mor ! com P Iex thaD determining the dynamic range. To 

values PA, PB, PC, and PD are pixel values PI, P2, P3, and determine a dynamic range, units 620 and 630 determine the 

P4 in the prior frame at positions illustrated in FIG. 5 Step difference between the largest and smallest pixel values in 

460 inserts the reference value Rij into the reference array. respective 3x3 and 8x8 blocks In the exemplary 

When the reference array is complete, step 270 combines 25 f*tx>diment, each pixel value is an 8-bit value mdicatmg the 

current with the reference array in a filter operation such as ^ uma £ r c a P* el f that . thc P 0551 ^ dynamic ranges are from 

defined by Equation 3 and Table 2 above. Alternatively, each 0 t0 255 - ^ *y**™ ™S e the small block can be 

pixel value Pij can be combined with reference values Rij as S real f r < han t the dynamiC r f ge , f ° r , the ™°? k * the 

the reference values become available. smallcr block COQtams P^ 1 values from outsldc lar S cr 

After filtering every pixel in the current array, temporal 30 block. 
filter 130 rounds the filtered current array to normal pixel To the mter stren S th a PP hed to a tar § et P ixel in the 

value precision (e.g., 8-bits) and provides the rounded array currcnt framc ' ^ T sclect unit 640 gyrates « 

to spatial adaptive filter 140. FIG. 6 illustrates an embodi- parameter p, and the filter applied to a target pixel is of the 

ment of spatial adaptive filter 140. Spatial adaptive filter 140 form S iven m ^q™ 1 ™ 5 - 

includes a filter strength select unit 610 that selects a filter 35 Oij-round_aad_dip((i-p)*i , iy+p*/ ; '(Pj)')) Equation 5 

strength for the filtering of each target pixel Pij from a _ _ _ _ . t . , 

current frame 650. Filter strength select unit 610 bases [ n Eq™tion_5, Oy is the output pixel value from filter 140 

selection of the filter strength on a dynamic range DR3 of for ! ai Sf 1 P™*/* ^j) 15 the out P ut P lxel value of a 

pixel values in a smaller block containing the target pixel, a ?P aUal filler 640 * filler 140 > ^ round_and_clip is a 

dynamic range DR8 of pixel values in a larger block 40 functl0n rounds lts argument to the nearest integer and 

containing the target pixel, and the quantization step size cb P s that result accord ing to the range of allowed pixel 

MQUANT for the macroblock containing the target pixel. A vaIue ' Parameter p is restricted to a range from 0 to 1, where 

dynamic range is the difference between the largest and the the stren S th of the mter leases with parameter p. For P 

smallest pixel values in an area. In the embodiment of FIG. *<? ual !? zero ' 0Ut P^ P wel value 0i J * ec * ual 10 tar S et P^ 1 

6, the smaller block is a 3x3 block centered on the target 45 Pl J ^^red. For p equal to one, output pixel value Oij is 

pixel value, and the larger block is an 8x8 block that was e( * ual 10 ^result F(Pij) from spatial filter 640. 
subjected to a DCT during encoding. It has been found that Fllter 640 can be an y desired spatial filter. In an exem- 

similarities and differences between dynamic ranges DR3 P lar ? embodiment of the invention, spatial filter is a "5x5 

and DR8 for the smaller and larger blocks suggest the image ^ filter " * at excludes from the filter operation pixel 

content of the area including and surrounding the target 50 values that significantly differ from a target pixel value being 

pixel. Filter select unit 610 selects a filter as appropriate for filtercd * Tablc 4 lllustratcs the filter coefficients for the 

the image content suggested by the dynamic ranges. For exemplary embodiment of filter 640. 
example, a large dynamic range suggests that the associated 

block contains an edge of an object in the frame. The smaller TABLE 4 

block having a relatively small dynamic range DR3 and the 55 
larger block having a relatively large dynamic range DR8 
suggests that the larger block contains an edge of an object 
and the smaller block is near but does not contain a portion 
of that edge. In this case, the target pixel is strongly filtered 
because coding artifacts are common near sharp edges 60 
within a block that has been DCT transformed. A 3x3 region 
containing a large dynamic range suggests that the target 
pixel is at the edge of an object. In this case the target pixel 

is weakly filtered to avoid blurring of the edge. Both F(Pij) is the sum of the product of the filter coefficients from 

dynamic ranges DR3 and DR8 being moderate suggests that 65 Table 3 and pixel values. Each pixel value in a product is 

the target pixel is part of texture in the image frame, and a either the pixel value having a position relative to the target 

weak filter is applied to the target pixel to avoid blurring the pixel as indicated for the filter coefficient in the product or 
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the target pixel value if the pixel value in the position 
indicated for the filter coefficient differs from the target pixel 
value by more than a likeness threshold LT. For the exem- 
plary embodiment, Equation 6 shows the dependence of 
likeness threshold LT on the dynamic range DR3 of the 3x3 
block in the exemplary embodiment of the invention. 



ZT-104O.625*ZM*3 



Equation 6 



Tables 5.1 and 5.2 below indicate the selection of param- 
eter (3 for different values of dynamic ranges DR3 and DR8 
and the macroblock quantization step size MQUANT. Table 
5.1 indicates the values of parameter (3 when the quantiza- 
tion step size is six. For quantization step sizes MQUANT 
less than six the values in Table 5.1 are scaled by MQUANT/ 
6. 



10 



12 



example of the invention's application and should not be 
taken as a limitation. Various adaptations and combinations 
of features of the embodiments disclosed are within the 
scope of the invention as defined by the following claims. 
We claim: 

1. A method for improving appearance of a video image, 
comprising: 

representing a first frame in the video image by a first 
array of pixel values and a second frame in the video 
image by a second array of pixel values; 

selecting a plurality of motion vectors for a target pixel 
value in the first array, wherein each motion vector 
corresponds to a block of pixel values in the first array 
and identifies a block of pixel values in the second 
array; 



TABLE 5.1 



Parameter B for MQUANT ° 6 

DR8 

DR3 <5 <10 <15 <20 <25 <30 <40 <50 <60 <70 <90 <120 <160 <256 



<5 .25 


.25 


.25 


.25 


.25 


.25 


.32 


.38 


.44 


1 


1 


1 






<10 .10 


.25 


.25 


.30 


.30 


.30 


.35 


.40 


.45 


1 


1 


1 






<15 .05 


.15 


.30 


.30 


.30 


.25 


.30 


.30 


.35 


.9 


1 


1 






<20 0 


0 


.15 


.15 


.15 


.10 


.15 


.20 


.25 


.6 


.8 


1 






<25 0 


0 


0 


0 


0 


0 


.10 


.15 


.20 


.5 


.7 


.9 






<30 0 


0 


0 


0 


0 


0 


.05 


.10 


.15 


.4 


.5 


.8 


.9 




<40 0 


0 


0 


0 


0 


0 


0 


.05 


.1 


.3 


.5 


.7 


.9 


.9 


<50 0 


0 


0 


0 


0 


0 


0 


0 


0 


.3 


.4 


.5 


.7 


.8 


<60 0 


0 


0 


0 


0 


0 


0 


0 


0 


.3 


.3 


.4 


.5 


.6 


<70 0 


0 


0 


0 


0 


0 


0 


0 


0 


.3 


.3 


.3 


.4 


.4 


<90 0 


0 


0 


0 


0 


0 


0 


0 


0 


.3 


.3 


.3 


.3 


.3 


<120 0 


0 


0 


0 


0 


0 


0 


0 


0 


.3 


.3 


.3 


.3 


.3 


<160 0 


0 


0 


0 


0 


0 


0 


0 


.3 


.3 


.3 


.3 


.3 


0 


<256 0 


0 


0 


0 


0 


0 


0 


0 


.3 


.3 


.3 


.3 


.3 


0 



Table 5.2 indicates the values of parameter p, for average determining a reference value for the target pixel value, 

quantization greater than 10. wherein the reference value depends on the motion 

TABLE 5.2 

Parameter 6 for MQUANT ^11 

DR8 



DR3 


<5 <10 


<15 


<:20 


<25 


<30 


<40 


<50 


<60 


<70 


<90 


<120 


<160 


<256 


<5 


.5 .5 


.5 


.5 


.5 


.5 


.63 


.76 


.89 


1 


1 


1 






<10 


.2 .5 


.5 


.6 


.6 


.6 


.7 


.8 


.9 


1 


1 


1 






<15 


.1 .3 


.6 


.6 


.6 


.5 


.6 


.6 


.7 


.9 


1 


1 






<20 


.1 .1 


.3 


.3 


.3 


.2 


.3 


.4 


.5 


.6 


.8 


1 






<25 


.1 .1 


.1 


.1 


.1 


.1 


.2 


.3 


.4 


.5 


.7 


.9 






<30 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.2 


.3 


.4 


.5 


.8 


.9 




<40 


.1 .1 


.1 • 


.1 


.1 


.1 


.1 


.1 


.2 


.3 


.5 


.7 


.9 


.9 


<50 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.1 


.1 


.3 


.4 


.5 


.7 


.8 


<60 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.1 


.1 


.3 


.3 


.4 


.5 


.6 


<70 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.1 


.1 


.3 


.3 


.3 


.4 


.4 


<90 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.1 


.1 


.3 


.3 


.3 


.3 


.3 


<120 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.1 


.1 


.3 


.3 


.3 


.3 


.3 


<160 


.1 .1 


.1 


..1 


.1 


.1 


.1 


.1 


.1 


.3 


.3 


.3 


.3 


.3 


<256 


.1 .1 


.1 


.1 


.1 


.1 


.1 


.1 


.1 


.3 


.3 


.3 


.3 


.3 



For quantization step size MQUANT greater than 6 but less 60 
than 11, parameter |3 is determined by linear interpolation 
between a value from Table 5.1 and a value from Table 5.2. 

The microfiche appendix contains a C-language program 
listing for a software embodiment of a postfilter in accor- 
dance with an exemplary embodiment of the invention. 65 

Although the invention has been described with reference 
to particular embodiments, the description is only an 



vectors selected for the target pixel value and one or 
more pixel values from the second array; and 
combining the target pixel value with the reference value 
in a filter operation that generates an output pixel value 
for a third array, the third array representing improved 
version of the first frame, wherein the output pixel 
value is equal to the target pixel value if a difference 
between the target value and the reference value is 



3/11/05, EAST Version: 2.0.1.4 



US 6,178 : 

13 

greater than a threshold value and is equal to a linear 
combination of the target pixel value and the reference 
value if the difference is not greater than the threshold 
value. 

2. The method of claim 1, wherein selecting the motion 5 
vectors comprises: 

selecting a first motion vector that corresponds to a first 
block containing the target pixel value; and 

selecting a second motion vector that corresponds to a 
second block neighboring the first block. 10 

3. The method of claim 2, wherein the second block abuts 
the first block, and of blocks that abut the first block, the 
second block has a boundary closest to the target pixel value. 

4. The method of claim 3, wherein determining the 
reference value for the target pixel value comprises: 15 

combining the motion vectors selected for the target pixel 
value to generate a filter vector; and 

selecting as the reference value a pixel value in the second 
array, at a position offset from a position of the target 2Q 
pixel value by an amount indicated by the filter vector. 

5. The method of claim 4, wherein combining the motion 
vectors comprises: 

selecting weighting factors that depend on the position of 
the target pixel value in the first array; and 25 

determining a weighted average of the motion vectors 
using the selected weighting factors. 

6. Trie method of claim 3, wherein determining the 
reference value for the target pixel value comprises: 

for each of the motion vectors selected for the target pixel 30 
value, identifying a pixel value that is in the second 
array, at a position that is offset from a position 
corresponding to. the target pixel value by an amount 
indicated by the motion vector; and 

combining the pixel values identified to determine the 35 
reference value for the target pixel value. 

7. The method of claim 6, wherein combining the pixel 
values comprises: 

selecting weighting factors that depend on the position of 
the target pixel value in the first array; and 40 

determining a weighted average of the identified pixel 
values using the selected weighting factors. 

8. The method of claim 1, wherein determining the 
reference value for the target pixel value comprises: 

combining the motion vectors selected for the target pixel 45 
value to generate a filter vector; and 

selecting as the reference value a pixel value from the 
second array, wherein the pixel value selected is in the 
second array, at a position that is offset from a position 5Q 
corresponding to the target pixel value by an amount 
indicated by the filter vector. 

9. The method of claim 8, wherein combining the motion 
vectors comprises: 

selecting weighting factors that depend on the position of 55 
the target pixel value in the first array; and 

determining a weighted average of the motion vectors 
using the selected weighting factors. 

10. The method of claim 1, wherein determining the 
reference value for the target pixel value comprises: 60 

for each of the motion vectors selected for the target pixel 
value, identifying a pixel value that is in the second 
array, at a position offset from a position corresponding 
to the target pixel value by an amount indicated by the 
motion vector; and 65 

combining the pixel values identified to determine the 
reference value for the target pixel value. 
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11. The method of claim 10, wherein combining the pixel 
values comprises: 

selecting weighting factors that depend on the position of 
the target pixel value in the first array; and 

determining a weighted average of the identified pixel 
values using the selected weighting factors. 

12. The method of claim 1, further comprising decoding 
a bit stream representing the video image, wherein: 

the decoding exacts from the bit stream motion vectors 
that are required for further decoding of the bit stream; 
and 

selecting the plurality of motion vectors for the target 
pixel value comprises selecting a motion vector 
extracted from the bit stream. 

13. The me mod of claim 12, wherein the bit stream is 
encoded according to a video standard selected from a group 
consisting of the MPEG-1 standard, the MPEG-2 standard, 
the H.261 standard, and the H.263 standard. 

14. The method of claim 12, wherein decoding further 
includes determining a quantization factor from the bit 
stream. 

15. The method of claim 1, further comprising for each 
pixel value in the first array, repeating the selecting, 
determining, and combining steps with the pixel value as the 
target pixel value. 

16. A method for improving appearance of a video image, 
comprising: 

determining motion vectors for first areas in a first frame 
of the video image that is represented by a first array of 
pixel values, each motion vector corresponding to a 
first area in the first frame and a second area in a second 
frame, wherein image content of the second area in the 
second frame is similar to the image content of the first 
area in the first frame; 

determining for each pixel in the first frame a reference 
vector that is a combination of a motion vector for a 
first area containing the pixel and one or more of the 
motion vectors for adjacent first areas; 

generating a reference array containing reference values, 
wherein each reference value in the reference array is 
equal to the pixel value at a relative position in the 
second array that is offset from a position of the 
reference value by an amount indicated by the refer- 
ence vector; and 

generating a filtered array representing an improved ver- 
sion of the first frame, wherein the filtered array con- 
tains pixel values that are combinations of pixel values 
from the first array and the reference values, and 
wherein each pixel value in the filtered array is equal to 
a corresponding pixel value in the first array if a 
difference between the corresponding value and a cor- 
responding reference value in the reference array is 
greater than a threshold value and is equal to a linear 
combination of the corresponding pixel value and the 
corresponding reference value if the difference is not 
greater than the threshold value. 

17. The method of claim 16, wherein determining refer- 
ence vectors comprises combining of the motion vector for 
the first area containing the pixel and motion vectors for first 
areas that are nearest to the first area containing the pixel. 

18. A method for improving appearance of an image, 
comprising: 

representing the image using a first array of pixel values; 
determining a first range for pixel values in a first block 
that is in the first array and includes a target pixel value; 
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determining a second range for pixel values in a second 
block that is in the first array and includes the target 
pixel value, wherein the second block is smaller than 
the first block; 

selecting a spatial filter from a plurality of spatial filters, 5 
wherein the spatial filter is selected according to the 
first and second ranges; and 

applying the selected spatial filter to the target pixel value, 
wherein applying the selected spatial filter combines 
the target pixel value with surrounding pixel values in 10 
the first array to generate a corresponding pixel value in 
a second array representing the image. 

19. The method of claim 18, wherein the second block is 
a 3x3 block of pixel values centered on the target pixel 
value. 15 

20. The method of claim 19, further comprising perform- 
ing an inverse frequency transformation on a block of 
transform coefficients to determine the pixel values in the 
first block. 

21. The method of claim 18, wherein selecting the spatial 20 
filter comprises: 

selecting a first spatial filter in response to the second 
range being greater than a first threshold value; and 

selecting a second spatial filter in response to the first 2 s 
range being greater than a second threshold and the 
second range being less than a third threshold, wherein 
the second spatial filter is stronger than first spatial 
filter. 

22. The method of claim 18, for each pixel value in the 30 
first array, using that pixel value as the target pixel in a 
repetition of the steps of determining the first range, deter- 
mining the second range, selecting a spatial filter, and 
applying the selected spatial filter. 

23. The method of claim 18, wherein applying the 35 
selected spatial filter comprises: 

identifying a likeness threshold that corresponds to the 
second range; and 

excluding from the combination that generates the corre- 
sponding pixel value any pixel values that differ from 40 
tie target pixel value by more than the likeness thresh- 
old. 

24. The method of claim 18, wherein selecting a spatial 
filter comprises selecting a filter strength parameter P cor- 
responding to the first and second ranges. 45 

25. The method of claim 24, wherein: 
the target pixel value is Pij; 

the corresponding value is Oij and is determined from 
pixel values of the first array according to an equation 
Oij=(l=p)*Pij+p*F(Pij), where F(Pij) is a linear com- 
bination of one or more pixel values near the target 
pixel value in the first array. 

26. The method of claim 25, wherein applying the 
selected spatial filter comprises identifying a likeness thresh- 
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old that corresponds to the second range, and linear com- 
bination F(Pij) excludes pixel values that differ from the 
target pixel value by more than the likeness threshold. 

27. A method for improving appearance of an image, 
comprising: 

representing the image using a first array of pixel values; 

determining a range of pixel values in a block that is in the 
first array and includes a target pixel value; 

identifying a likeness threshold that corresponds to the 
range determined; and 

generating an output pixel value for a second array 
representing an improved-appearance version of the 
image, the output pixel value being a linear combina- 
tion of the target pixel value and one or more pixel 
values of the first array, the linear combination exclud- 
ing pixel values that differ from the target pixel by more 
than the likeness threshold. 

28. The method of claim 27, wherein the likeness thresh- 
old is linearly related to the range. 

29. A method for improving appearance of a video image, 
comprising: 

decoding a signal to generate a first series of arrays of 
pixel values, wherein each array of pixel values repre- 
sents a frame in the video image and comprises a set of 
blocks; 

applying a block boundary filter to pixel values at bound- 
aries of the blocks in the frames to generate a second 
series of arrays of pixel values, wherein applying the 
block boundary filter leaves unchanged pixel values 
that are not at a boundary of any of the blocks; 

performing a temporal filtering operation that combines 
pixel values from different arrays in the second series 
to generate a third series of arrays of pixel values; and 

applying a spatial filter to the arrays in the third series to 
generate a fourth series of arrays representing the video 
image with improved appearance. 

30. The method of claim 29, wherein the signal comprises 
a plurality of sets of transformation coefficients with each set 
corresponding to a different one of the blocks in the arrays 
of the first series, and decoding comprises for each set of 
transformation coefficients, performing an inverse transfor- 
mation on the set of transformation coefficients to generate 
pixel values in the block corresponding to the set of trans- 
formation coefficients. 

31. The method of claim 29, wherein applying the spatial 
filter comprises: 

filtering each pixel value in an array using a filter that has 

an adjustable parameter; and 
altering the parameter according to content of an area in 

a frame that includes a pixel represented by a pixel 

value being filtered. 

***** 
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[57] ABSTRACT 

A motion adaptive spatio-temporal filtering method is 
employed as a prefilter in an image coding apparatus, which 
processes the temporal band-limitation of the video frame 
signals on the spatio-temporal domain along the trajectories 
of a moving component without temporal aliasing by using 
a filter having a band-limitation characteristic according to 
a desired temporal cutoff frequency and the velocity of 
moving components. 

4 Claims, 6 Drawing Sheets 
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MOTION ADAPTIVE SPATIO-TEMPORAL 
FILTERING OF VIDEO SIGNALS 

FIELD OF THE INVENTION g 

The present invention is directed to a method and an 
apparatus for the temporal filtering of video signals; and, in 
particular, to a motion adaptive spatio-temporal filter- 
(MASTF) for use in an image encoding apparatus, capable 
of achieving a temporal band limitation without incurring 10 
temporal aliasing effects and thereby obtaining an improved 
picture quality. 



DESCRIPTION OF THE PRIOR ART 



15 



20 



In digital television systems such as video-telephone, 
teleconference and high definition television systems, an 
image coding apparatus has been used to reduce a large 
volume of data defining each frame of video signals by way 
of employing various data compression techniques, for 
example, a transform coding using a Discrete Cosine Trans- 
form, and a motion compensation coding for reducing the 
temporal relationship between two successive frames. 

In order to effectively carry out the data compression 
process, most real-time image coding apparatus available in 25 
the art employ various filters as a part of a front-end 
processing for the filtering and frame rate reductioa These 
filters serve to eliminate or alleviate temporal noises and 
perform band limitation to thereby improve the picture 
quality and coding efficiency. 

One of such prior art apparatus is disclosed in an article 
by Eric Dubois et al./'Noise Reduction in Image Sequences 
Using Motion-Compensated Temporal Filtering " IEEE 
Transactions on Communications, COM-32, No. 7 (July, 
1984), which utilizes a nonlinear recursive temporal filter to 
reduce noise components which may arise in an initial signal 
generation and handling operation. This temporal filter 
employs a motion compensation technique to perform the 
filtering in the temporal domain along the trajectory of a 
motion to thereby reduce noise components in moving areas 
without modifying the details of an image. 

Another prior art apparatus is described in an article by 
Wen-Hsiung Chen et al., "Recursive Temporal Filtering and 
Frame Rate Reduction for Image Coding", IEEE Journal on 
Selected Areas in Communications SACS (August, 1987), 
which also employs a recursive temporal filter to perform a 
recursive filtering and frame rate reduction. This filter when 
applied in the temporal domain can smooth out frame-to- 
frame input noises and improve the picture quality. 

U.S. Pat. No. 4,694,342 issued to K. J. Klees provides an 
apparatus which utilizes a spatial filter that can function both 
recursively and non-recursively for removing noises from a 
video image while substantially preserving the details 
thereof. This filter includes a lookup table for storing pre- 
defined and filtered output pixel values and predefined 
feedback pixel values wherein certain portions of an incom- 
ing image are filtered non-recursively to substantially pre- 
serve the image details while certain other portions of the 
same image are filtered recursively to remove noises there- 
from. 

While the above and other prior art apparatus may be 
capable of reducing the noises in moving areas without 
altering the image details through the use of a lowpass 
filtering technique performed along the trajectory of a 65 
motion, such approaches tend to introduce artifacts in those 
areas where the motion occurs in a relatively high speed. As 
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a result, such apparatus are not equipped to adequately deal 
with the temporal band limitation or the visual artifacts 
resulting from temporal aliasing. 

If the repeated spectra include the aliasing components, 
visual artifacts appear in the image. Especially, those mov- 
ing areas comprised of spatial high frequency components 
may distort psychovisual effects: this is, the perceived 
velocity on moving areas may differ from the actual velocity. 
To achieve an efficient temporal band-limitation, therefore, 
it is desirable to have a temporal filter which is not affected 
by the aliasing effect. 

SUMMARY OF THE INVENTION 

It is, therefore, a primary object of the present invention 
to provide a motion adaptive spatio-temporal filtering 
method capable of effectively performing temporal band- 
limitation of a video signal without incurring temporal 
aliasing and thereby improving the picture quality. 

In accordance with the present invention, there is pro- 
vided a method for filtering a video signal with a predeter- 
mined temporal cutoff frequency to achieve a temporal 
band-limitation thereof, wherein said video signal includes 
a multiplicity of frames each of which having a multiple 
number of pixels, the method for obtaining filtered result for 
a target pixel in a target frame in the video signal which 
comprises the steps of: 

estimating a multiplicity of motion vectors each of > 
which represents the movement at the target pixel position 
in each frame of the video signal; 

determining, as a filtering input function, a multiplicity of 
groups of pixel values on trajectories of the target pixel 
wherein each of the groups is determined on the trajectory 
of the target pixel in a corresponding frame through the use 
of the motion vector for the frame; and 

performing a convolution of the filtering input function 
with a predetermined filter impulse response, thereby 
obtaining a filtered video signal which has the predeter- 
mined temporal bandwidth without temporal aliasing. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above and other objects and features of the instant 
invention will become apparent from the following descrip- 
tion of preferred embodiments taken in conjunction with the 
accompanying drawings, in which: 

FIGS. 1A, IB and 1C are diagrams illustrating base-band 
spectrum distributions as a function of the velocity of a 
moving object; 

FIG. 2 is a diagram depicting a result of the conventional 
lowpass filtering in the temporal domain with a fixed tem- 
poral cutoff frequency; 

FIG. 3 is a diagram for illustrating a filtering input 
function in the spatio-temporal domain; 

FIGS. 4A to 4D illustrate the result of the motion adaptive 
spatio-temporal filtering in accordance with the present 
invention; and 

FIG. 5 is a schematic block diagram representing an 
image coding apparatus employing the motion adaptive 
spatio-temporal filtering method in accordance with a pre- 
ferred embodiment of the present invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

A video signal may be treated in terms of its 3-dimen- 
sional, i.e., horizontal, vertical and temporal components; 
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and described as a continuous function f 3 (x,y,t). Assuming 
that its moving objects have only a constant- velocity rigid 
translational motion v^v^Vy), the Fourier transform of the 
continuous video signal, F 3 (), may be represented as fol- 
lows: 5 



Eg- 0) 



wherein F^f^ f >t ) is the Fourier transform of a 2-diraen- 
sional video signal f 2 (x,y), and S&V-^vytf) represents a 10 
tilted plane in a 3 -Dimensional frequency space described 
by the equation f^+f^Vy+f^ so that the baseband exists 
only on a 2-Dimensional frequency plane. Eq. (1) is dis- 
closed in, e.g., an article by R. A. F. Belfor, et ah, "Motion 
Compensated Subsampling of HDTV", SPIE, 1605, Visual 15 
Communications and Image Processing '91, pp 274-284 
(1991). From the location of a baseband spectrum, a spatio- 
temporal bandwidth can be anticipated That is, if a temporal 
bandwidth f t w is given, the relationship among the temporal 
bandwidth f g w , the spatial bandwidth i x w and f y w % and the 20 
velocity components v, and v y is obtained from Eq. (1) as 
follows: 



wherein f x w and f™ are the respective spatial bandwidth 
components in x and y directions. From Eq. (2), it can be 
seen that the temporal bandwidth is proportional to the 
velocity of the moving objects; and when the temporal 30 
bandwidth is fixed, the spatial bandwidth becomes inversely 
proportional to the velocity of the moving object 

Since the video signal for the filtering is sampled with a 
spatial and temporal sampling frequencies, the sampled 
video signal is represented as 3-Dimensional sampled data, 35 
i.e., pixels. Therefore, sampling of the continuous function 
fjQ may be expressed by multiplying the continuous func- 
tion f 3 (x,y,t) with a 3-Dimensional array of delta functions. 
A spectrum distribution of the pixels may be then given by 
the convolution of Fourier transform of f 3 (-) and a delta 40 
function. As a result, the spectrum of the pixels is replicated 
at intervals of the sampling frequencies by the characteris- 
tics of the delta function. 

Referring first to FIGS. 1A, IB, and 1C, there are shown 
baseband spectrum distributions as a function of the velocity 45 
of a moving object v x =l pixel/frame interval, v=2 pixels/ 
frame interval and v x =3 pixels/frame interval, wherein solid 
lines indicate the replicas of a baseband; and the temporal 
sampling frequency is normalized to 1; and the spatial (x 
axis direction) and temporal frequencies are designated as f x so 
and f p respectively. 

The motion of a pixel A in the moving object causes the 
spectrum to become skewed from the spatial frequency axis 
as shown in FIG. 1A. As shown in FIGS. 1A, IB and 1C, the 
angle 8 of said skewing increases as does the velocity. From 55 
Eq. (2), the reason for the skewing can be readily understood 
by considering the temporal frequency at a pixel in the video 
signal: since the spectrum distribution on the spatio-tempo- 
ral frequency domain is related to the product of the spatial 
frequency and the speed of the moving object, a higher 60 
velocity of the moving object gives rise to a higher temporal 
frequency. It should be stressed that the spectrum is skewed 
and not rotated. 

Referring to FIG. 2. results of lowpass filtering in the 
temporal domain with a fixed temporal cutoff frequency if 65 
are illustrated In order to perform the temporal filtering, two 
assumptions may be made as follows: first, baseband spec- 



trum has no spatial aliasing components, and secondly, for 
the sake of simplicity, therm exists only purely horizontal 
motion (represented in terms of f J with a constant velocity. 
In FIG. 2, the filtered result contains, e.g., spatial high 
frequency components B of adjacent spectra which represent 
temporal aliasing. That is, the spatial high frequency com- 
ponents affect the temporal low frequency components of 
the adjacent replicas. In other words, a disturbance between 
the spatial high frequency components and the low fre- 
quency ones of the adjacent replicas appears in the displayed 
image. 

As may be seen from Eqs. (1) and (2), the relation 
between the spatial (including the vertical and the horizontal 
components) and temporal frequencies f, and f, are repre- 
sented as follows: 



Eq. (3) 



wherein the spatial frequency f, is defined on f>f y plane. 
As is seen from Eq. (3), it should be appreciated that, when 
the temporal cutoff frequency is fixed in order to limit the 
temporal bandwidth, the spatial cutoff frequency becomes 
inversely proportional to the absolute value of the velocity 
of the moving object 

Assuming that h(0 is an impulse response of a lowpass 
temporal filter and, for simplicity, there exists only a purely 
horizontal motion (x axis direction), then the temporal 
band-limited video signal g(x,t) may be represented as 
follows: 



J — oo 



Eq.(4) 



wherein a linear phase filter is used to reduce the effect of 
a group-delay of a filter response. From the assumption of 
constant-velocity rigid translational motion v^v^ v y ) and 
purely horizontal motion, a filtering input function may be 
represented as follows. 



From Eq. (5), the displacement of the moving pixel along 
the temporal frequency axis can be represented by its 
trajectory in the spatial domain at a point on the temporal 
axis. Thus, Eq. (4) may be rewritten as: 



Eq.(6) 



On the other hand, in case of a real video signal the 
assumption of constant-velocity ligid translational motion is 
not always valid. Furthermore, in the case that there is no 
moving object, each pixel value of the video data signal vary 
with the time due to, e.g., changes in lighting source and 
characteristics of video signal generating device such as a 
video camera. In such cases, Eq. (5) holds true only for a 
short period of time and can be rewritten as: 

yUK*+l)A/)=^+vJ(fHkA/).A/, Hkto) (7) 

wherein At denotes a short period of time, e.g.i a frame 
interval and k is an integer. In accordance with Eq. (7), the 
equation (6) can be rewritten as: 
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*U0 = 



(*+l)A/ 
kAi 



Eq. (8) 



AdVU + vAi -kAi)-d- kAi).t - kAi)ch 



From Eq. (8), it can be appreciated that the temporal 
filtering of Eq. (4) can be achieved by spatio-temporal 
filtering with its filtering input function f(). 

Eq. (8) is a continuous description of the motion adaptive 
spatio-temporal filtering. Similar results hold in the discrete 
case: the integral is replaced by summation and dx is 
represented by At and j. Eq. (8) is then given by 



iV f L-l 
I \ z 



W4 + 0 fa +■ vLx.n -J) - At • l,n 



LO 



Eq. (9) 



20 



wherein n is a frame index; the velocity and the filtering 
positions are replaced by vectors v and x; filter impulse 
response h() comprising (2N++l)xL filter coefficients is 
predetermined in conjunction with the temporal cutoff fre- 
quency and the predetermined numbers N, L(N,L are posi- 
tive integers); and if we denote a pixel-to-pixel interval as 
Ax, At is selected to satisfy Iv(-)-At1^1AxI (If At fails to 25 
satisfy the condition, it may cause spatial aliasing). 

Therefore, as may be seen from Eq. (9), the temporal 
band-limitation can be achieved by spario- temporal filtering, 
i.e., lowpass filtering of the filtering input function taken 
from both spatial and temporal domains. 

On the other hand, if AT is a frame to frame interval, then 
LAr is equal to AT and v(«)-AT is equal to D() which is a 
motion vector representing a displacement of a pixel 
between two neighboring frames. Then, Eq.(9) can be modi- 
fied as follows: 
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35 



Eq. (10) 
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wherein L is selected to satisfy ID(-)I^IAxl L (This con- 
dition is equivalent to the condition Iv(->At)^IAxI described 
earlier, therefore if L fails to satisfy this condition, it may 
cause spatial aliasing). Eq. (10) is an implementation of Eq. 
(9). The temporal band-limitation is achieved by spatio- 45 
temporal filtering, i.e., lowpass filtering on the filtering input 
function f() which comprises a multiplicity of, e.g., (2N+1), 
groups of filtering input data wherein each group includes a 
predetermined number of, e.g, L filtering input data which 
are obtained from pixel values of corresponding frame in the 50 
video signal. In Eq. (10), (x+D(x, n-j)-l/L) which denotes a 
position of filtering input data in (n-j)th frame of the video 
signal, may not coincide with exact pixel positions. In that 
case, the filtering input data can be determined from adjacent 
pixels located around the position by using, e.g., bilinear 55 
interpolation method which determines a weighted sum of 
the adjacent pixel values as the filtering input data. That is, 
the filtering input function is obtained on the spario-tempo- 
ral domain along the trajectories of moving object. Specifi- 
cally, a group of input data included in the filtering input 60 
function f(-) may be determined from the pixel values of a 
corresponding frame using the motion vector which repre- 
sents the displacement of the moving object between the 
frame and its previous frame in the video signal as will be 
described in conjunction with FIG. 3. 

On the other hand, the filter impulse response comprising 
a plurality, i.e., (2N+l)xL, of filter coefficients serves to 



limit the bandwidth of the video signal to a predetermined 
bandwidth. These filter coefficients may be predeterrnined 
based on a desired temporal cutoff frequency and a prede- 
termined numbers N and L. For example, when the temporal 
cutoff frequency is f, c , the filter impulse response is designed 
with a spatio-temporal cutoff frequency of f/7L. 

Actually, as may be seen from Eq. (10), the filtered data 
g(), i.e., band-limited data, is obtained by convolving each 
group of filtering input data with corresponding filter coef- 
ficients and by summing each group of filtered input data. 

Referring to FIG. 3, there is shown an explanatory dia- 
gram illustrating the filtering input function for the motion 
adaptive spatio-temporal filtering method of the present 
invention. For the sake of simplicity, each frame is denoted 
as a line, e.g., F c-1 , F c and F^,, and N and L of Eq. (10) are 
assumed to be 1 and 4, respectively. In other words, to obtain 
the filtered data for a target pixel in a target frame F c , three 
filtering input frames, i.e., the target frame F c containing the 
target pixel to perform filtering operation thereon and its two 
neighboring frames F c . n F^, are used for the filtering 
process wherein c-1, c, and c+1 denote frame indices; and 
four filtering input data are determined on each filtering 
input frame based on the motion vector for the pixel at the 
target pixel position in its subsequent frame. The position of 
the target pixel is denoted as x 10 , x 20 and x 30 in the frames 
F c _i, F e and F^,, respectively, and the vertical axis is a time 
axis. 

In order to obtain the filtered data for the target pixel at x 20 
in the target frame F c , a multiplicity of, i.e., three, groups of 
filtering input data are decided, each group including a 
predetermined number, e.g., 4, of filtering input data located 
on the corresponding motion trajectory for the target pixel in 
the corresponding filtering input frame. Specifically, three 
groups of filtering input data positioned at (x 10 , X n , X 12 , 

X 13>. ( X 20> X 21» X 22> *23> ^ (*30> X 31» * 32 , * 33 ) 

determined on the trajectories of the pixels at the target pixel 
position based on the motion vectors D(x 10 , c-1), D(x 20 , c) 
and D(x3o, c+1) in the frames F^, F c and F^, respectively. 

As shown in FIG. 3, it is readily appreciated that the 
filtering input data are equivalent to the target pixel values 
in temporally interpolated or upsampled frames of the video 
signal. For instance, the filtering input data at x n in the 
frame F c _j is equivalent to the pixel value at Xi 0 at time 
t=^3AT/4. That can be denoted as: 



Eq. (11) 
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The equivalence between the spatial domain and the time 
domain is denoted as dotted line in FIG. 3. 

Referring now to FIGS. 4A to 4D, there is shown the 
result of the lowpass temporal filtering of the video signal on 
a spatio-temporal domain through the use of the motion 
adaptive spario-temporal filtering method. In FIG. 4A, there 
is shown a baseband spectrum of the original video signal. 
As described above, the process of obtaining each group of 
filtering input data is equivalent to temporal upsampling or 
interpolating as illustrated in FIG. 4B. If the desired cutoff 
frequency of the temporal lowpass filtering is f, c , the spatio- 
temporal cutoff frequency f c of the filter of the present 
invention is f,7L as shown in FIG. 4C. The final spectra for 
the filtered results are shown in FIG. 4D which are the 
subsampled versions of the spectra in FIG. 4C(note that the 
filtered results are not provided for the interpolated frames). 
Comparing with the temporal band-limitation depicted in 
FIG. 2, it should be readily appreciated that the spatio- 
temporal band-limitation of the present invention is not 
affected by temporal aliasing components. 

As may be seen from the Eq. (10) and FIGS. 3, 4A, 4B, 
4C. and 4D, it should be appreciated that the filtering 
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operation is performed on a spario- temp oral domain along 
the trajectory of moving objects to thereby achieve a tem- 
poral band limitation. Therefore, the temporal aliasing, 
which may occur in the repeated Spectra when the velocity 
of the moving objects is increased, can be effectively elimi- 5 
nated by the inventive filter to thereby greatly reduce the 
visual artifacts appearing in the moving areas in the image. 

Referring now to, FIG. 5, there is shown an image coding 
apparatus employing the motion adaptive spatio-temporal 
filter in accordance with a preferred embodiment of the 10 
present invention. The image coding apparatus comprises a 
filtering circuit 100 for performing the motion adaptive 
spatio-temporal filtering in accordance with the present 
invention and a video encoding circuit 60 for eliminating 
redundancies in the filtered video signal in order to compress 15 
the video signal to a more manageable size for the trans- 
mission thereof. The video signal is generated from a video 
signal source, e.g., video camera(not shown), and fed to the 
filtering circuit 100. 

The filtering circuit 100 performs the motion adaptive 20 
spatio-temporal filtering operation, as previously described, 
in accordance with Eq. (10). The filtering circuit 100 
includes a frame buffer 10, a motion estimator 20, a motion 
vector buffer 30, a filtering input formatter 40 and a filtering 
calculator 50. The frame buffer 10 stores a current frame 25 
which is being inputted to the filtering circuit 100 and a 
multiplicity of, e.g., (2N+1), previous frames, i.e., filtering 
input frames to be used in a filtering procedure. Specifically, 
assuming that N=l, the frame buffer 10 stores the current 
frame and three filtering input frames F Cbll F c andF c# . |f 30 
wherein F^, c+1. c, and c-1 are frame indices. The motion 
estimator 20 receives two successive frames of the video 
signal, i.e., the current frame F^ of the video signal 
inputted directly from the video signal source and its pre- 
vious frame F^j stored in the frame buffer 10, and extracts 35 
motion vectors associated with each of the pixels included in 
the current frame F c+2 . In order to extract motion vectors, 
various motion estimation method, as well known in the art, 
may. be employed (see, e.g., MPEG Video Simulation Model 
Three, International Organization for Standardization, 40 
Coded Representation of Picture and Audio Information 
1990, ISO-ffiajTCl/SC2/WG8 MPEG 90/041), 

The extracted motion vectors are coupled to the motion 
vector buffer 30 to be stored therein. In accordance with the 
present invention, the motion vector buffer 30 stores motion 45 
vectors for the frames F^, F^, F c and F c .,. 

The filtering input frames stored in the frame buffer 10 
and the motion vectors associated with the filtering input 
frames stored in the motion vector buffer 30 are coupled to 
the filtering input formatter 40. The filtering input formatter 50 
40 determines a multiplicity, e.g., 3, of groups of filtering 
input data which constitute the filtering input function f(.) in 
Eq. (10). As described above, in case filtering input data is 
determined to be located at a position which does not fall on 
the exact pixel position, the filtering input formatter 40 55 
provides the filtering input data by calculating a weighted 
sum of the four neighboring pixels thereof. The filtering 
input data are coupled to the filtering calculator SO. 

At the filtering calculator 50, the filtered data g(-) is 
calculated as represented by Eq. (10) using the filtering input 60 
data inputted from the filtering input formatter 40. 

The filter impulse response comprising a plurality of, e.g., 
(2N+l)xL, filter coefficients is determined according to the 
desired temporal cutoff frequency f, e , N and L which are 
pi^erermined so as to satisfy the condition described earlier 65 
in conjunction with Eq. (10) by considering the character- 
istics of the video signal. The filter coefficients may be 
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predetermined prior to the filtering process and stored in the 
filtering calculator 50. As described above, the filtering 
circuit 100 performs the motion adaptive spario- temporal 
filtering operation to thereby obtain a temporal band-limited 
video signal. 

The filtered video signal outputted from the filtering 
calculator 50 is coupled to the video encoding circuit 60 
wherein the video signal is compressed by various method 
known in the art (see, e.g., MPEG Video Simulation Model 
Three, International Organization for Standardization, 
Coded Representation of Picture and Audio Information, 
1990, ISO-DEC/JTC 1/SC2/WG8 MPEG 90/041). The 
encoded video signal is coupled to a transmitter for the 
transmission thereof. 

While the present invention has been shown and describe 
with reference to the particular embodiments, it will be 
apparent to those skilled in art mat many changes and 
modifications may be made without departing from the spirit 
and scope of the invention as defined in the appended 
claims. 

What is claimed is: 

1. An apparatus for providing a filtered data for each of 
pixels of a video signal by filtering the video signal with a 
predetermined temporal cutoff frequency to achieve a tem- 
poral band limitation thereof, wherein said video signal 
comprises a multiplicity of filtering input frames which 
include a target frame to perform a filtering operation 
thereon and a predetermined number of preceding frames 
and subsequent frames of said target frame, each of the 
filtering input frames having a multiple number of pixels, 
comprising: 

means for estimating a plurality of motion vectors each of 
which represents the movement for each of the pixels 
included in the video signal; 

means for determining a filtering input function for a 
target pixel included in the target frame, wherein the 
filtering input function includes a multiplicity of groups 
of filtering input data; each group of the filtering input 
data is determined on a trajectory of a pixel at the target 
pixel position in each of the multiplicity of filtering 
input frames based on a motion vector of the pixel at 
the target pixel position; 

means for performing a convolution of the filtering input 
function with a filter impulse response determined 
according to a spatio-temporal cutoff frequency f 
which is represented as: 

fc T 

wherein f t e is the temporal cutoff frequency; and L is a 
predetermined positive integer related to the velocity of 
a moving object in the video signal, thereby obtaining 
filtered data for the target pixel in the target frame. 
2. The apparatus of claim 1, wherein said filtered data is 
represented as follows: 

wherein x is the position of the target pixel; n is the index 
of the target frame in the video signal; the filter impulse 
response h( ) includes (2N+l)xL filter coefficients; j is 
a index whose absolute value is not greater than N; N, 
L are positive integers; and D() is a motion vector 
representing a motion for the target pixel. 
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3. A method for providing a filtered data for a target pixel 
in a video signal by filtering the video signal with a 
predetermined temporal cutoff frequency to achieve a tem- 
poral band limitation thereof, wherein said video signal 
comprises a multiplicity of filtering input frames which 5 
include a target frame having the target pixel therein and a 
predetermined number of preceding frames and subsequent 
frames of said target frame, each of the filtering input frames 
having a multiple number of pixels, comprising the steps of: 
estimating a multiplicity of motion vectors each of which 10 
represents the movement for each of the pixels at the 
target pixel position in each frame of the video signal; 
determining a filtering input function for the target pixel, 
wherein the filtering input function includes a multi- 
plicity of groups of filtering input data; each group of 15 
the filtering input data is determined on a trajectory of 
a pixel at the target pixel position in each of the 
multiplicity of filtering input frames based on a motion 
vector of the pixel at the target pixel position; and 
performing a convolution of the filtering input function 20 
with a filter impulse response determined according to 
a spatio-temporal cutoff frequency f c which is repre- 
sented as: 



10 




wherein f, c is the temporal cutoff frequency; and L is a 
predetermined positive integer related to the velocity of 
a moving object in the video signal, thereby obtaining 
filtered data for the target pixel in the target frame. 

4. The method of claim 3, wherein said filtered data is 
represented as follows: 

N / t=L-l / i \ 

wherein x is the position of the target pixel; n is the index 
of the target frame in the video signal; the filter impulse 
response h(-) includes (2N+l)xL filter coefficients; j is 
a index whose absolute value is not greater than N; N, 
L are positive integers; and D(«) is a motion vector 
representing a motion for the target pixel. 

***** 
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[57] ABSTRACT 

A device for deriving an ancillary signal from a compressed 
digital video signal (e.g. MPEG), wherein the ancillary 
signal includes selected parts of the main signal, for 
example, the DC coefficients' of I-pictures, or the 
unscrambled parts, which parts can be used for display in a 
(multi-) picture -in-picture television receiver, or as an 
"appetizer" in order to encourage the user to pay a subscrip- 
tion fee, is described. The ancillary signal can separately be 
recorded in digital video recorders so as to assist the user in 
finding the beginning of a scrambled program on tape. The 
ancillary signal can also be generated at the transmitter end 
and transmitted at a low bit rate. A decoder for decoding 
such an ancillary signal is considerably simpler and less 
expensive than a full-spec MPEG decoder. A decoding 
method is also described. 

22 Claims, 4 Drawing Sheets 
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METHOD AND APPARATUS FOR derived. Various applications thereof are conceivable. A 

MULTIPLEXING AND TRANSMITTING picture-in-picture television receiver, for example, may 

AUTONOMOUSLY/INTRA CODED comprise the arrangement so as to obtain the ancillary signal 

PICTURES ALONG WITH THE MAIN OR I, for display as the picture-in-picture. In a multi-picture-in- 

P, B PICTURES/VIDEO 5 picture television receiver, the arrangement may be used to 

decode a plurality of elementary video streams, and simul- 

CROSS REFERENCE TO RELATED taneously display the respective ancillary signals as a mosaic 

APPLICATIONS picture. In a video recorder, the arrangement may be used to 

_ . . . _ ,. . „ fcr Mt ^ m obtain a low-quality version of a video signal for separate 

This is a continuation of application Ser. No. 08/422,378 M fecordmg ^ ^ t0 be repro duced at higher playback speeds, 

filed Apr. 14, 1995 now abandoned. If the ancillary signal tQe unscrambled parts of a 

FIELD OF THE INVENTION ™*" m ™ l \ ^ 0V ? a vid ?> ^ m % bc 

viewed free of charge but at a low quality. The ancillary 

The invention relates to an arrangement for decoding a signal thus acts as an "appetizer", attracting the viewer's 

digital video signal encoded as an MPEG elementary video 15 attention to the presence and contents of the main signal. 

stream. The invention also relates to television receivers, When recorded simultaneously with the main signal on a 

video recorders, and transmitters comprising such an digital video recorder, the ancillary signal also assists the 

arrangement. user in finding the beginning of a scrambled program on 

tape. 

BACKGROUND OF THE INVENTION 20 The arrangement may also be used in transmitters. 

An arrangement for decoding a digital video signal is porting t0 the invention a transmitter for transmitting a 

disclosed in "ISO/IEC CD 13818: Information digital video signal encoded as an MPEG elementary video 

technology-Generic coding of moving pictures and asso- stream ' 15 characterized in that the transmitter comprises the 

ciated audio information", Dec. 1, 1993, further referred to arrangement for decoding said video signal, and means for 

as the MPEG standard. Part 1 of this standard relates to the 25 Emitting the ancillary video signal as a further elemen- 

system aspects of digital transmission, Part 2 relates more tarv Vlde0 stream. "Die MPEG standard allows a program* 

particularly to video encoding. comprise more than one elementary video stream. The 

w«r^-» • * - - . , . _ ^ ancillary video signal thus transmitted may serve the pur- 

MPEG2 is a packet-based time multiplex system Data is mentioned before. For decoding the ancillary signal, a 

transmitted m transport packets. Each transport packet con- 30 simple decoder is adequate . ^ transmitted ancillary signal 

tains data from exactly one elementary stream with which it requires omv a low bitrate . ^ 
is associated by means of its packet identifier. Examples of 

elementary streams are video streams, audio streams, and BRIEF DESCRIPTION OF THE DRAWINGS 
data streams. One or more elementary streams sharing the 

same time base make up a program. A typical program might 35 FIG - 1 shows a diagram of an arrangement for carrying 

consist of one video stream and one audio stream. One or oul tDe method according to the invention, 

more programs constitute a transport stream. FIG. 2 shows a flow chart illustrating the operation of the 

arrangement shown in FIG. 1. 

OBJECT AND SUMMARY OF THE INVENTION FIG. 3 shows a diagram of another embodiment of the 

It is, inter alia, an object of the invention to provide an 40 arrangement for carrying out the method according to the 

arrangement which renders it possible to implement new and invention. 

known features in a more cost-effective manner. FIG. 4 shows a flow chart illustrating the operation of the 

According to the invention, the arrangement for decoding arrangement shown in FIG. 3. 
an MPEG elementary video stream is characterized in that FIG. 5 shows a diagram of an arrangement for transmit- 
the arrangement comprises means for decoding selected 45 »'ng the ancillary video signal as an elementary MPEG 
parts of said elementary stream, and means for rearranging bitstream of the same program as the main video signal, 
said selected parts so as to constitute an ancillary video FIG. 6 shows a diagram of a digital picture-in-picture 
signal. As only selected parts of the elementary signal are television receiver according to the invention, 
decoded, the arrangement is considerably simpler and less 5Q FIG 7 shows a diagram of a digital multi-picture-in- 
expensive than a full-spec MPEG decoder. picture television receiver according to the invention. 

As is known in the prior art, an MPEG encoded video FIGS. 8A and 8B show embodiments of a digital video 

signal includes autonomously encoded pictures (I-pictures) recorder according to the invention, 
and predictively encoded pictures (P-pictures and 

B-pictures). The selected parts constituting the ancillary 55 DESCRIPTION OF EMBODIMENTS 

signal may be, for example, the I-pictures. In that case, the FIG. 1 shows a diagram of an arrangement according to 

arrangement is simple because motion compensation cir- lhe invention. The arrangement comprises a variable-length 

cuitry and a large amount of memory can be dispensed with. decoder 10 (herema fter VLD), an inverse quantizer 11 and 

An embodiment of the arrangement is characterized in that a picture memory 12 . The arrangement receives an elemen- 

said selected parts are the DC coefficients of autonomously 6Q lary video slream repreS enting a main video signal Vm and 

encoded pictures. Such an arrangement is extremely simple. derives lnerefrom an ^ciliary video signal Va. The main 

MPEG also allows parts of the signal to be scrambled, video signal Vm is assumed to have been encoded according 

whereas other parts remain unscrambled. A further embodi- to "ISO/IEC CD 13818-2: Information technology— 

ment of the arrangement is characterized in that the selected Generic coding of moving pictures and associated audio 

parts are the unscrambled parts of the video signal. 65 information— Part 2: Video", Dec. 1, 1993, also referred to 

The arrangement provides an ancillary video signal hav- as the MPEG2 video coding standard. For understanding the 

ing a lower quality than the main signal from which it is invention, it suffices to mention that the main signal Vm 
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includes autonomously encoded pictures (I-pictures) and accommodated in the picture header is decoded. In the step 

predictive ly encoded pictures (P- and B-pictures). Each 22, it is ascertained whether said picture coding type indi- 

picture has been divided into blocks of 8*8 pixels and each cates that an I-picture is being received. If that is not the 

block has been transformed to spectral coefficients. The case, the VLD returns to step 20 to await the next picture 

relevant coefficients are subjected to a combination of Huf- 5 start code. 

man coding and runlength coding. Four luminance blocks If the picture is an I-picture, a step 40 is performed in 

and associated chrominance blocks constitute a macroblock wn *ch the VLD allows the picture header to be kept in the 

and a plurality of macroblocks constitute a slice. The first memory by generating an appropriate write enable signal. In 

(DC) coefficient of blocks of I-pictures represents the aver- a ste P 41 > the slice header 15 received and stored in the 

age luminance and chrominance of an 8*8 pixel block. The 10 m f mor y- A macroblock is now being received. In a step 42, 

bitstream Vm further includes overhead information such as aU macroblock data up to the first block is written in the 

syncwords, picture type parameters, and the like. memory. 

c ,u * l. • * ii 1° a stc P 43, the VLD detects the presence of a DC 

The operation of the arrangement shown in FIG. 1 will _ . 4 r . ' . ... . r , , 

u i j ♦ a -4k * * a u ^ u coefficient of a block within the current macroblock and 

now be elucidated with reference to a flow chart shown in „ ... «- . t t , t . . 4 . » 

cir- ~>i « ♦ ♦ 1ft , n n j . . 4U-. - i« allows this coefficient to be stored m the memory. In a step 

FIG. 2. In a first step 20, the VLD reads the input bitstream 15 AA t , . ~. . * « , * « 4 . r 

j j * j 11 j * **i - *, . . » • . j 44, the subsequent coefficients of the block up to the 

and discards all data until a picture start code is encountered. . ' . - * e ^ i ^ ^ ^ ^ 4**. ,n i> 

n.^jc- jt . <*-■ detection of an end-of-block code are discarded. The VLD 

Data defining a picture is now bemg received. In a step 21, c . , . . . rr . 7 r\ 

the picture ceding type accommodafed in the picture header teb *™ B " Mn &? aetiaa * ™ te e , nab ^ 

is decoded. In a step 22, it is established whether said picture c °f« e f rece ' vcd - 1 ° » s,e P 4 ?' ,hc «d^f-block 

, • j • * *t_ t t • * • u • . , If m code is stored m the memory. Steps 27-29 are the same as 

coding type indicates that an I-picture is being received. If 20 . , J . „ ^ c ... 

t . , • t . A xn p. ,f „ . . „ f . the corresponding steps shown in FIG. 2. They facilitate the 

that is not the case, the VLD returns to step 20 to await the . • r . . T • i t 

♦ • ♦ ♦ rr#k • ♦ f * *u \/t t\ check of whether or not the current I-picture has been 

next picture start code. If the picture is an I-picture, the VLD , r 

• orocesseu 

successively awaits the reception of a slice header (step 23) F _ , . __ _ , 

and the reception of a macroblock (step 24). " The arrangement shown in FIG. 3 thus copies the mam 

r . , . . _„ 25 bitstream Vm in memory 30, thereby ignoring the P- and 

In a step 25, the VLD decodes and outputs the DC a n • n .w n ^ <= • . c, 

„ . . y . ' . ... y T, , . B-picmres as well as the non-DC-coefficients of I-pictures. 

coefficient of a block withm the current macroblock. In a j& anciu yideo ^ Va obtained b readin H out the 

J r k f ^^coefficients up t0 «he detection of an m 30 ^ (hc sam ^ as ^ crcated b arr * , 

end-of-block code are discarded. In a step 27, itis ascer- shown jn F , G j ^ fe nQW fo/^^*,, 35 , 

tamed whether all blocks of a macroblock have been pro- mrther element sfenal . j, is , low bitrale Uca of ^ 

cessed^ As long as that is not the case, the VLD returns to maifl ^ ^ ^ , feduced ^ and tempoia f resollltion . 

step 25. In a step 28, it is ascertained whether all macrob- °L . .. , r , . 4 r 

locks of a slice have been processed. As long as that is not . ™' 5 ^ a dl *&* m of a transmit f ac ^g ^ the 

the case, the VLD returns to step 24. Finally, ilis ascertained mvenU ° D - transmitter comprises a demultiplexer 50 a 

in a step 29 whether all slices of the picture have been tT f SCodcr 51 » a ^ ircmt 52 ^ or regenerating program specific 

processed. As long as (hat is not the case, the VLD returns 35 informatl0a > and a ^multiplexer 53. The arrangement 

to step 23. If all slices have been processed, the VLD returns rec61VeS a P acketized i stream TS1. Said transport 

to step 20 in order to search the next I-picture in the Stream a of audiovisual programs, each 

bitstream program bemg formed by one or more elementary streams 

-mrj p» . . . „ ^ . - T . (e.g. video, audio, data). The transport stream also comprises 

Hie VLD thus .extracts the DC coefficients of I-pictures 4Q kels accommodaling program-specific information 

from the input bitstream As shown m FIG. 1 said coeffi- (psi) ps , kets tf wfaich ams m M 

cients are supphed to the m verse quanUzer 11 and then how many and which elementary streams each 

stored in memory 12 Each DC coefficient represents the comprises. A detailed description of transport 

average luminance and chrominance value of an 8*8 pixel sUeams and program . specific Momi{ion can be t mt & m 

block of the main video I-pictures. The ancillary video 45 « lS0/lEC CD 13818 . 1: lnformation technology-Generic 

signal is obtamed by readmg out said memory with an coding of moving piclures afld associ ^ ed audio 

appropriate ume basis. information— Part 1: Systems", Dec. 1, 1993, also known as 

In an alternative embodiment, steps 25 and 26 are modi- me MPEG2 systems standard 

fied so as to decode all coefficients of a block. In that case, Demultiplexer 50 selects an elementary video stream Vm 

the ancillary signal comprises I-pictures and is a temporally 5Q from which ^ anciU si j Va fa {Q be deriyed Qther 

reduced version of the mam signal. elementary streams E1,E2£3 remain unprocessed in this 

FIG. 3 shows a diagram of another embodiment of the embodiment. The main video signal Vm is applied to 
arrangement for carrying out the method according to the transcoder 51 which may take the form of the arrangement 
invention. In this arrangement the bitstream representing the shown in FIG. 3, already discussed. The transcoder outputs 
main video signal Vm is supplied to a memory 30 and 55 a n ancillary video signal Va in the form of a further 
variable-length decoder 31. The variable-length decoder elementary stream. Demultiplexer 50 also extracts program- 
analyses the bitstream and generates a write enable signal specific information packets PSli from the transport stream 
WE so as to determine which part of the bitstream is stored TS1 and applies them to circuit 52. This circuit updates the 
in the memory. The memory is read out at a lower bitrate so program-specific information so as to specify that ancillary 
as to constitute an elementary video stream representing the 60 signal Va is present in output transport stream TS2. The 
ancillary video signal Va. circuit further specifies that Va is associated with the same 

The operation of the arrangement shown in FIG. 3 will audiovisual program as the main elementary stream Vm 

now be elucidated with reference to a flow chart shown in from which it has been derived. The updated program- 

FIG. 4. The steps 20-22 are the same as the corresponding specific information PS 12 and the ancillary elementary 

steps shown in FIG. 2. Thus, in the step 20, the VLD reads 65 stream Va are then added by re multiplexer 53 to the original 

the input bitstream and discards all data until a picture start elementary streams and retransmitted as a new transport 

code is encountered. In the step 21, the picture coding type stream TS2. 
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FIG. 6 shows a diagram of a digital picture-in-picture PSI-data, for each available program i, the packet identifier 
(PIP) television receiver according to the invention. The PID defining the transport packets conveying the ancillary 
receiver comprises a demultiplexer 60, an MPEG2 audio video signal Vi associated therewith. For a plurality of 
decoder 61, an MPEG2 video decoder 62, a PIP-decoder 63, different programs, the relevant PIDs are successively 
and a video mixer 64. The demultiplexer 60 receives an 5 applied to the demultiplexer 70 so as to apply the associated 
MPEG2 transport stream TS and extracts therefrom an ancillary video signal Vi to MPIP-decoder 74. Each decoded 
elementary audio stream Al and an elementary video stream small-size picture is stored in a section of memory 743 under 
VI associated with a desired program. The elementary ^ of the ^ address WA generated by control 

streams Al and VI are decoded by audio decoder 61 and Ml ?6 ^ ^ of pictures together 

video decoder 62 respectively. The decoded audio Signal is ,„ cmstimes a mosaic video icture which can be 
apphed to a loudspeaker 65. The demultiplexer further ^ ^ CMtIO , on ^ screeQ „ via the sM „ 
extracts from the transport stream TS a further elementary , 

video stream V2 associated with a different program which . U P on «*M?1«*"» of ™* ° f delayed miniature 
is to be displayed as picture-in-picture. The further elemen- !*«■«•<•* by a cursor device not shown), the control 
tary stream V2 is decoded by PIP-decoder 63 and converted , , c 1 . ircu,, 76 inverts the selected display screen position into 
into a signal V2 with a reduced size and temporal resolution. 15 ' he P r( ?«f m ™ mber associated therewith, and controls 
Both video signals VI and V2 1 are mixed in video mixer 64 demultiplexer 70 so as to select the audio stream Aj and 
and displayed on a display screen 66. vldeo f eam . V> ™* the selected program. The 

, . , ... , f ,u nin i . control circuit further controls switch 77 so as to display the 

In a first embodiment of the PIP-receiver, elementary , . , t „ , ... _■• ■ 

stream V2 defines a full-size, full- resolution MPEG- 20 ^ted Program m full size and resolution on display 

encoded video signal, including I, P and B-pictures. In this 2 ° screen 75 md t0 reproduce lts ^ ™ louds P eak ' r 72 ' 

embodiment, PIP-decoder 63 takes the form of the circuit FIGS - 8A md 8B show ^ embodiments of a digital 

shown in FIG. 1, already discussed. In a second embodiment video recorder according to the invention. The recorder 

of the PIP-receiver, the elementary stream V2 is assumed to receives an MPEG2 transport stream TS and comprises a 

be an ancillary video stream transmitted by an arrangement 25 demultiplexer 80 to select therefrom an elementary audio 

as shown in FIG. 5. As explained with reference to FIG. 5, stream A and an elementary video stream V. The elementary 

the elementary stream V2 comprises DC-coefficients of video strcam » assumed to have been scrambled such that 

I-pictures only. In this embodiment, PIP-decoder 63 also on[ y the predictively encoded (P and B) pictures are 

takes the form of the circuit shown in FIG. 1, already scrambled. Both elementary streams are recorded on a 

discussed. However, the variable-length decoder (10 in FIG. 30 di S ital stora ge medium 81. 

1) is simpler because various types of overhead data are In the embodiment shown in FIG. 8 A, the video stream V 

absent in the bitstream and thus do not need to be processed. is further applied to a transcoder 82 which may take the form 

FIG. 7 shows a diagram of a digital multi-picture- in- *s shown in FIG. 1. In the embodiment shown in FIG. 8B, 

picture (MPIP) television receiver according to the inven- me demultiplexer further selects an ancillary elementary 

tion. The receiver comprises a transport stream demulti- 35 stream Va, which is transmitted by an arrangement as shown 

plexer 70, an MPEG2 audio decoder 71, a loudspeaker 72, ™ FIG. 5. The ancillary stream is applied to a "simple" 

an MPEG2 video decoder 73, a MPIP-decoder 74, a display MPEG decoder 89 as explained hereinbefore with reference 

screen 75 and a control circuit 76. The demultiplexer 70 to 6 - 

receives an MPEG2 transport stream TS and extracts there- In both embodiments, the ancillary signal Va comprises 

from an elementary audio stream Aj and an elementary 40 the DC-coefficients of I-pictures of the same program as 

video stream Vj, both associated with a program number j. video signal V. This ancillary signal is separately recorded 

The elementary streams Aj and Vj are decoded by audio on storage medium 81. Upon normal playback, the recorded 

decoder 71 and video decoder 73, respectively. The decoded audio and video elementary streams A and V are decoded by 

audio signal is applied to loudspeaker 72. The decoded video MPEG2 audio decoder 83 and MPEG2 video decoder 85, 

signal is displayable, via a switch 77, on display screen 75. 45 respectively. The audio signal is applied to an audio output 

The demultiplexer further extracts from the transport stream terminal 84. If the video signal has been scrambled, it can 

TS a further elementary video stream Vi associated with a only be displayed when processed by a descrambler 86. The 

program i. The further elementary stream Vi may define a descrambler may take the form of a circuit which is activated 

full-size, full-resolution MPEG-encoded video signal, only upon insertion of a smart card holding a sufficient 

including I, P and B-pictures. The further elementary video 50 amount of credit. These types of descramblers are known per 

stream Vi may also be an ancillary video stream transmitted se in the art. The decoded and descrambled video signal is 

by an arrangement as shown in FIG. 5. In the latter case, the then applied, via a switch 87, to a video output 88. 

ancillary video signal comprises, as explained above, The video recorder can optionally output, via switch 87, 

DC-coefficients of I-pictures only. the separately recorded ancillary signal Va. Said signal 

MPIP-decoder 74 is adapted to decode the ancillary signal 55 comprises DC-oefficients of I-pictures only and is thus not 

Vi. The decoder comprises a variable-length decoder 741, an scrambled. Displaying this signal in reduced size or, after 

inverse quantizer 742 and a memory 743. The decoder has suitable upsampling (not shown), in full size but with 

the same structure as the arrangement shown in FIG. 1. reduced resolution, allows the user to scan the storage 

However, memory 743 now has a plurality of memory medium 81 for locating the start of a particular scrambled 

sections, addressed by a write address WA, each section 60 program without having yet to pay therefor. It is to be noted 

having the capacity to store a respective small-size picture. that in the embodiment of FIG. 8B the "simple" decoder 89 

In operation, control circuit 76 receives from demulti- can also be located in the reproduction part (i.e. between 

plexer 70 the transport packets accommodating program- storage medium 81 and switch 87) of the video recorder, 

specific information PSI. As already mentioned before, said In summary, the invention relates to a method and 

packets specify which programs are available, as well as 65 arrangement for deriving an ancillary signal from a com- 

how many and which elementary streams each program pressed digital video signal (e.g. MPEG). The DC coeffi- 

comprises. The control circuit is adapted to read from the cients of autonomously encoded pictures (I-pictures) are 
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selected from the compressed signal. The ancillary signal 
thus obtained can be used for display in a (multi-) picture- 
in-picture television receiver. If the main signal is 
scrambled, the ancillary signal can be used as an "appetizer" 
in order to encourage the user to pay a subscription fee. The 5 
ancillary signal can separately be recorded in digital video 
recorders so as to assist the user in finding the beginning of 
a scrambled program on tape. The ancillary signal can also 
be generated at the transmitter end and transmitted at a low 
bit rate. A decoder for decoding such an ancillary signal is 10 
considerably, simpler and less expensive than a fiill-spec 
MPEG decoder. 
What is claimed is: 

1. A transmitter comprising: 

means for generating a main elementary video stream that 15 
includes autonomously encoded pictures and predic- 
tively encoded pictures; 

means for selecting from the main elementary video 
stream only the autonomously encoded pictures; 

means for arranging the autonomously encoded pictures 
to generate an ancillary elementary video stream; 

means for multiplexing the main elementary video stream 
and the ancillary elementary video stream to generate a 
transport stream; and, 2 5 

means for transmitting the transport stream wherein the 
transport stream is transmitted over a transmission 
channel to a receiver that is located remotely from the 
transmitter. 

2. The transmitter as set forth in claim 1, wherein the 30 
transport stream further includes a plurality of additional 
elementary video streams corresponding to a plurality of 
different programs. 

3. The transmitter as set forth in claim 1, wherein the 
ancillary elementary video stream comprises a low bitrate 35 
replica of the main elementary video stream. 

4. The transmitter as set forth in claim 1, wherein: 

the main elementary video stream comprises a main 
MPEG-encoded elementary video stream; 

the ancillary elementary video stream comprises an ancil- 40 
lary MPEG-encoded elementary video stream; and, 

the transport stream comprises an MPEG transport 
stream. 

5. The transmitter as set forth in claim 1, wherein the ^ 
means for selecting comprises means for selecting from the 
main elementary video stream all of the autonomously- 
encoded pictures included in the main elementary video 
stream, but none of the predictively : encoded pictures 
included in the main elementary video stream. 5Q 

6. The transmitter as set forth in claim 3, wherein the low 
bitrate replica of the main elementary video stream is 
arranged to be displayed as a picture-in-picture by the 
receiver. 

7. The transmitter as set forth in claim 1, wherein the 55 
ancillary elementary video stream is arranged to be dis- 
played by the decoder as a separate one of a plurality of 
pictures that together form a mosaic picture. 

8. A receiver, comprising: 

means for receiving a transport stream that includes 6Q 
multiplexed main and ancillary elementary video 
streams; 

means for de-multiplexing the main and ancillary elemen- 
tary video streams to generate separate main and ancil- 
lary elementary video streams; and, 65 

wherein the main elementary video stream includes 
autonomously encoded pictures and predictively 
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encoded pictures, and the ancillary elementary video 
stream includes only autonomously encoded pictures 
from the main elementary video stream. 

9. The receiver as set forth in claim 8, wherein the 
transport stream further includes a plurality of additional 
elementary video streams corresponding to a plurality of 
different programs. 

10. The receiver as set forth in claim 8, wherein the 
transport stream is received from a transmitter that is located 
remotely from the receiver. 

11. The receiver as set forth in claim 8, wherein: 

the main elementary video stream comprises a main 
MPEG-encoded elementary video stream; 

the ancillary elementary video stream comprises an ancil- 
lary MPEG-encoded elementary video stream; and, 

the transport stream comprises an MPEG transport 
stream. 

12. The receiver as set forth in claim 8, wherein the 
ancillary elementary video stream contains all of the 
autonomously-encoded pictures included in the main 
elementary video stream, but none of the predictively- 
encoded pictures included in the main elementary video 
stream, 

13. The receiver as set forth in claim 8, further comprising 
means for displaying the ancillary elementary video stream 
as a picture-in-picture. 

14. The receiver as set forth in claim 8, further comprising 
means for displaying the ancillary elementary video stream 
as a separate one of a plurality of pictures that together form 
a mosaic picture. 

15. A system, comprising: 
a transmitter that includes: 

means for generating a main elementary video stream 
that includes autonomously encoded pictures and 
predictively encoded pictures; 

means for selecting from the main elementary video 
stream only the autonomously encoded pictures; 

means for arranging the autonomously encoded pic- 
tures to generate an ancillary elementary video 
stream; 

means for multiplexing the main elementary video 
stream and the ancillary elementary video stream to 
generate a transport stream; and, 
means for transmitting the transport stream; and, 
a receiver that includes: 

means for receiving the transport stream; and, 
means for de-multiplexing the main and ancillary 
elementary video streams to generate separate 
main and ancillary elementary video streams. 

16. The system as set forth in claim 15, wherein the 
transport stream further includes a plurality of additional 
elementary video streams corresponding to a plurality of 
different programs. 

17. The system as set forth in claim 15, wherein: 

the main elementary video stream comprises a main 
MPEG-encoded elementary video stream; 

the ancillary elementary video stream comprises an ancil- 
lary MPEG-encoded elementary video stream; and, 

the transport stream comprises an MPEG transport 
stream. 

18. The system as set forth in claim 15, wherein the 
ancillary elementary video stream contains all of the 
autonomously-encoded pictures included in the main 
elementary video stream, but none of the predictively- 
encoded pictures included in the main elementary video 
stream. 
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19. A method, comprising: 

generating a main elementary video stream that includes 
autonomously encoded pictures and predictively 
encoded pictures; 

selecting from the main elementary video stream only the 
autonomously encoded pictures; 

arranging the autonomously encoded pictures to generate 
an ancillary elementary video stream; 

multiplexing the main elementary video stream and the 
ancillary elementary video stream to generate a trans- 
port stream; and, 

transmitting the transport stream over a transmission 
channel to a receiver that is located remotely from the 
transmitter. 

20. The transmitter as set forth in claim 1, further com- 
prising means for updating and transmitting a program 



10 



15 



10 



specific information stream to specify the presence of the 
ancillary elementary video stream and the associated main 
elementary video stream. 

21. Hie receiver as set forth in claim 8, wherein the means 
for de-multiplexing have been arranged to obtain an updated 
program-specific information stream that specifies the pres- 
ence of the ancillary elementary video stream and the 
associated elementary video stream. 

22. The system as set forth in claim 15, wherein the 
transport stream further includes an updated program- 
specific information stream that specifies the presence of the 
ancillary elementary video stream and the associated MPEG 
elementary video stream. 
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A method transcodes groups of macroblocks of a partially 
decoded input bitstream. The groups of macroblocks include 
intra-mode and inter-mode macroblocks. Each macroblock 
includes DCT coefficients, and at least one motion vector. 
The modes of each group of macroblocks are mapped to be 
identical only if there is an inter-mode block and an intra- 
mode macroblock in the group. If any of the macroblocks in 
the group are mapped, then the DCT coefficients and the 
motion vector for such mapped macroblocks are modified in 
accordance with the mapping to generate reduced-resolution 
macroblock for an output compressed bitstream to compen- 
sate for drift. 
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VIDEO TRANSCODER WITH SPATIAL bit-rate R^, then encoded at an output bit-rate R^, 102 to 

RESOLUTION REDUCTION produce the output bitstream 103. Usually, the output rate is 

lower than the input rate. In practice, full decoding and full 

encoding in a transcoder is not done due to the high 

FIELD OF THE INVENTION 5 complexity of encoding the decoded bitstream. 

Earlier work on MPEG-2 transcoding has been published 

This invention relates generally to the field of transcoding by Sun et al., in "Architectures for MPEG compressed 

bitstreams, and more particularly to reducing spatial reso- bitstream scaling," IEEE Transactions on Circuits and Sys- 

lution while transcoding video bitstreams. tems for Video Technology, April 1996. There, four methods 

1fl of rate reduction, with varying complexity and architecture, 

BACKGROUND OF THE INVENTION were described. 

ir , u » FIG. 2 shows a first example method 200, which is 

Video compression enables the storing, transmitting, and t . 4 , ,r # t ' . ^ 

r c , . c P? - * referred to as an open-loop architecture. In this architecture, 

processing of visual informat.on with fewer storage * * 

network, and processor resources. The most widely used ./ „ , , . , . / r . . / . 

' r . , , , . , A X/fnc ,~ i r . 1{ specifically, macroblocks of the input bitstream are variable- 
video compression standards mclude MPEG-l for storage 15 , , , „,t , . . , . , 
, , . , , • t wnr^ e j • .fi length decoded (VLD) 210 and inverse quantized 220 with 
and retrieval of moving pictures, MPEG-2 for digital J* \ . . t . M . - /^^r\ 
♦ i juwir -j V * Tcn^rr a fine quanazerQ,, to yield discrete cosme transform (DCT) 
television, and H.263 for video conferencing, see ISO/IEC . . - , * aai* rx~£ 

11172-2.1993, "Information Technology-Coding of Mov- "e&Mntfc Given * e ° ul P™ ^t-rate 202 the DCT 
ing Pictures and Associated Audio for Digital Storlge Media " ocks "'T^^t * ' J^^tf Q .l° 
up to about 1.5 Mbit/s-Part 2: Video," D. LeGall/'MPEG: 20 ,he . ^ an / ize ' a T"™ £?aZ a T , 
A Video Compression Standard for Multimedia v ^le-kng h coded (VLC) 240 and a new output bit- 
Applications," Communications of the ACM, Vol. 34, No. 4, s ! rea ™ a ' a lowef * £ ?u I " 
pp. 46-58, 1991, ISO/IEC 13818-2:1996, "Information simpler than the scheme shown in FIG. lbecause the motion 
£ T ,_ , ^ . ~ ja vectors are re-used and an inverse DCT operation is not 
Technology — Generic Coding of Moving Pictures and Asso- j j * T . . .i . • r ^ . ^ . t . 

ciated Audio Information-Part 2: Video," 1994, ITU-T SG 25 need f> ' N° te > here the ch °'«; ° f Qi and Q stnctly depend 

XV, DRAFT H.263, "Video Coding for Low Bitrate on rate character^ics of the .bitstream. Otter factors, such as 

Communication," 1996, ITU-TSG XVI, DRAFT13 H.263+ p0SS1 .^ y ' , spat,al <* a ' a <™« «* «°e bitstream are not 

Q15-A-60 rev.O, "Video Coding for Low Bitrate C °° S ,!r, ; , , , ,„ 

Communication " 1997 " FIG. 3 shows a second example method 300. This method 

. ' . ' . , , , , .„ m is referred to as a closed-loop architecture. In this method, 

These standards are relative y low-level specifications ^ input video / ^ partial , y u > 

that primarily deal with a spatial compression ot images or macrob i ocks of the ^ bitstream are variable-length 

trames, and the spatial and temporal compression ot decoded (yLD) m and 

inverse quantized 320 with Q, to 

sequences of frames. As a common feature, these standards ield discrete cosine transform (DCT) coefficients i21 > In 
perform compression on a per frame basis. With these C0Qtrast tQ ^ flis , , e me ; ho(J ^ abov ^ 
standards, one can achieve high compression ratios for a rect ion DCT coefficients 332 are added 330 to the incoming 
wide range ot applications. DCT coefficient 321 to compensate for the mismatch pro- 
Newer video coding, standards, such as MPEG-4 for duced by re-quantization. This correction improves the 
multimedia applications, see ISO/IEC 14496-2:1999, quality of the reference frames that wiU eventually be used 
"Information technology— coding of audio/visual objects, f or decoding. After the correction has been added, the newly 
Part 2: Visual," allow arbitrary-shaped objects to be encoded f ormed blocks are re-quantized 340 with Q 2 to satisfy a new 
and decoded as separate video object planes (VOP). The rate , and variable-length coded 350, as before. Note, again 
objects can be visual, audio, natural, synthetic, primitive, q an d Q are rate based. 

compound, or combinations thereof Also, there is a signifi- \ Q obt J a lhc corrcction component 332, the re-quantized 

cant amount of error resilience features built into this 45 DCT coefficients are inverse quantized 360 and subtracted 

standard to allow for robust transmission across error-prone 370 from the original partially decoded DCT coefficients, 

channels, such as wireless channels. ^ differcncc ^ transformed to the spatial domain via an 

The emerging MPEG-4 standard is intended to enable i inverse DCT (IDCT) 365 and stored into a frame memory 

multimedia applications, such as interactive video, where 380. The motion vectors 381 associated with each incoming 

natural and synthetic materials are integrated, and where 50 block are then used to recall the corresponding difference 

access is universal. In the context of video transmission, blocks, such as in motion compensation 290. The corre- 

these compression standards are needed to reduce the spending blocks are then transformed via the DCT 332 to 

amount of bandwidth on networks. The networks can be yield the correction component. A derivation of the method 

wireless or the Internet. In any case, the network has limited shown in FIG. 3 is described in "A frequency domain video 

capacity, and contention for scarce resources should be 55 transcoder for dynamic bit-rate reduction of MPEG-2 

minimized. bitstreams," by Assuncao et al., IEEE Transactions on Cir- 

A great deal of effort has been placed on systems and cuits and Systems for Video Technology, pp. 953-957, 1998. 

methods that enable devices to transmit the content robustly Assuncao et al. also described an alternate method for the 

and to adapt the quality of the content to the available same task. In the alternative method, they used a motion 

network resources. When the content is encoded, it is 60 compensation (MC) loop operating in the frequency domain 

sometimes necessary to further decode the bitstream before for drift compensation. Approximate matrices were derived 

it can be transmitted through the network at a lower bit-rate for fast computation of the MC blocks in the frequency 

or resolution. domain. A Lagrangian optimization was used to calculate 

As shown in FIG. 1, this can be accomplished by a the best quantizer scales for transcoding. That alternative 

transcoder 100. In a simplest implementation, the transcoder 65 method removed the need for the IDCT/DCT components. 

100 includes a cascaded decoder 110 and encoder 120. A According to prior art compression standards, the number 

compressed input bitstream 101 is fully decoded at an input of bits allocated for encoding texture information is con- 
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trolled by a quantization parameter (QP). The above meth- 
ods are similar in that changing the QP based on information 
that is contained in the original bitstream reduces the rate of 
texture bits. For an efficient implementation, the information 
is usually extracted directly from the compressed domain 5 
and can include measures that relate to the motion of 
macroblocks or residual energy of DCT blocks. The meth- 
ods describes above are only applicable for bit-rate reduc- 
tion. 

Besides bit-rate reduction, other types of transformation 10 
of the bitstream can also be performed. For example, object- 
based transformations have been described in U.S. patent 
application Ser. No. 09/504,323, "Object-Based Bitstream 
Transcoder," filed on Feb. 14, 2000 by Vetro et al. Trans- 
formations on the spatial resolution have been described in 15 
"Heterogeneous video transcoding to lower spatio-temporal 
resolutions, and different encoding formats," IEEE Trans- 
action on Multimedia, June 2000, by Shanableh and Ghan- 
bari. 

It should be noted these methods produce bitstreams at a 20 
reduced spatial resolution reduction that lack quality, or are 
accomplished with high complexity. Also, proper consider- 
ation has not been given to the means by which recon- 
structed macroblocks are formed. This can impact both the 
quality and complexity, and is especially important when 25 
considering reduction factors different than two. Moreover, 
these methods do not specify any architectural details. Most 
of the attention is spent on various means of scaling motion 
vectors by a factor of two. 

FIG. 4 shows the details of a method 400 for transcoding 
an input bitstream to an output bitstream 402 at a lower 
spatial resolution. This method is an extension of the method 
shown in FIG. 1, but with the details of the decoder 110 and 
encoder 120 shown, and a down-sampling block 410 35 
between the decoding and encoding processes. The decoder 
110 performs a partial decoding of the bitstream. The 
down-sampler reduces the spatial resolution of groups of 
partially macroblocks. Motion compensation 420 in the 
decoder uses the full-resolution motion vectors mv / 421, 
while motion compensation 430 in the encoder uses low- 
resolution motion vectors mv r 431. The low-resolution 
motion vectors are either estimated from the down-sampled 
spatial domain frames y n J 403, or mapped from the full- 
resolution motion vectors. Further detail of the transcoder 45 
400 are described below. 

FIG. 5 shows the details of an open-loop method 500 for 
transcoding an input bitstream 501 to an output bitstream 
502 at a lower spatial resolution. In this method, the video 
bitstream is again partially decoded, i.e., macroblocks of the 5Q 
input bitstream are variable-length decoded (VLD) 510 and 
inverse quantized 520 to yield discrete cosine transform 
(DCT) coefficients, these steps are well known. 

The DCT macroblocks are then down-sampled 530 by a 
factor of two by masking the high frequency coefficients of 55 
each 8x8 (2 3 x2 3 )luminance block in the 16x16 (2 4 x2 4 ) 
macroblock to yield four 4x4 DCT blocks, see U.S. Pat. No. 
5,262,854, "Low-resolution HDTV receivers," issued to Ng 
on Nov. 16, 1993. In other words, down-sampling turns a 
group of blocks, for example four, into a group of four g 0 
blocks of a smaller size. 

By performing down-sampling in the transcoder, the 
transcoder must take additional steps to re-form a compliant 
16x16 macroblock, which involves transformation back to 
the spatial domain, then again to the DCT domain. After the 65 
down-sampling, blocks are re-quantized 540 using the same 
quantization level, and then variable length coded 550. No 
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methods have been described to perform rate control on the 
reduced resolution blocks. 

To perform motion vector mapping 560 from full 559 to 
reduced 561 motion vectors, several methods suitable for 
frame-based motion vectors have been described in the prior 
art. To map from four frame-based motion vectors, i.e., one 
for each macroblock in a group, to one motion vector for the 
newly formed 16x16 macroblock, simple averaging or 
median filters can be applied. This is referred to as a 4:1 
mapping. 

However, certain compression standards, such as 
MPEG-4 and H.263, support advanced prediction modes 
that allow one motion vector per 8x8 block. In this case, 
each motion vector is mapped from a 16x16 macroblock in 
the original resolution to an 8x8 block in the reduced 
resolution macroblock. This is referred to as a 1:1 mapping. 

FIG. 6 shows possible mappings 600 of motion vector 
from a group of four 16x16 macroblocks 601 to either one 
16x16 macroblock 602 or four 8x8 macroblocks 603. It is 
inefficient to always use the 1:1 mapping because more bits 
are used to code four motion vectors. Also, in general, the 
extension to field-based motion vectors for interlaced 
images is non-trivial. Given the down-sampled DCT coef- 
ficients and mapped motion vectors, the data are subject to 
variable length coding and the reduced resolution bitstream 
can be formed as is well known. 

It is desired to provide a method for transcoding bit- 
streams that overcomes the problems of the prior art meth- 
ods for spatial resolution reduction. Furthermore, it is 
desired to provide a balance between complexity and quality 
in the transcoder. Furthermore it is desired to compensate for 
drift, and provide better up-sampling techniques during the 
transcoding. 

SUMMARY OF THE INVENTION 

A method up-samples a compressed bitstream. The com- 
pressed bitstream is partially decoding to produce macrob- 
locks. Each macroblock has DCT coefficients according to a 
predetermined dimensionality of the macroblock. 

DCT filters are applied to the DCT coefficients of each 
macroblock to generate up-sampled macroblocks for each 
macroblock, there is one up-sampled macroblock generated 
by each filter. Each generated up-sampled macroblock has 
the predetermined dimensionality. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a prior art cascaded 
transcoder; 

FIG. 2 is a block diagram of a prior art open-loop 
transcoder for bit-rate reduction; 

FIG. 3 is a block diagram of a prior art closed-loop 
transcoder for bit-rate reduction; 

FIG. 4 is a block diagram of a prior art cascaded 
transcoder for spatial resolution reduction; 

FIG. 5 is a block diagram of a prior art open-loop 
transcoder for spatial resolution reduction; 

FIG. 6 is a block diagram of prior art motion vector 
mapping; 

FIG. 7 is a block diagram of a first application transcoding 
a bitstream to a reduced spatial resolution according to the 
invention; 

FIG. 8 is a block diagram of a second application 
transcoding a bitstream to a reduced spatial resolution 
according to the invention; 
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FIG. 9 is a block diagram of an open-loop transcoder for compensation in reduced resolution domain. The second 

spatial resolution reduction according to the invention; closed-loop architecture of FIG. 11a is of moderate com- 

FIG. 10 is a block diagram of a first closed-loop plexity. It includes a reconstruction loop, ID CT/DCT blocks, 

transcoder for spatial resolution reduction with drift com- a °d a frame store. The quality can be improved with drift 

pensation in the reduced resolution according to the inven- 5 compensation in the original resolution domain, and does 

t j on . require up-sampling of the reduced resolution frames. The 

FIG. 11a is a block diagram of a second closed-loop arc l li,ect " re . a signal 

transcoder for spatial resolution reduction with drift com- ob ' ained 10 me "duced resolution domam. 

pensation in the original resolution according to the inven- . To su PP° rt lhe , architectures according to the present 

J- ~ 10 invention, several additional techniques for processing 

blocks that would otherwise have groups of raacroblock 

FIG life is a block diagram of a third closed- loop ^ <<nlixed ,. modcs a , thc reduccd resolution arc also 

transcoder for spatial resoluUon reduction with drift com- described 

pensation in the original resolution according to the inven- A ^ of blocks> e g ^ four> to be d^.^ h 

tl0D ' 15 considered a "mixed" block when the group of blocks to be 

FIG. 12 is an example of a group of macroblocks con- down-sampled contains blocks coded in both intra- and 

taining macroblock modes, DCT coefficient data, and cor- inter-modes. In the MPEG standards I -frames contain only 

responding motion vector data; macroblocks coded according to the intra-mode, and 

FIG. 13 is a block diagram of a group of blocks processor P-frames can include intra- and inter-mode coded blocks, 

according to the invention; 20 These modes need to be respected, particularly while down- 

FIG. 14A is a block diagram of a first method for group sampling, otherwise the quality of the output can be 

of blocks processing according to the invention; degraded. 

FIG. 14B is block diagram of a second method for group m 1 e ^ ods fc \ r drift^mpensation and up-sampling 

of blocks processing according to the invention; ^CT based data are described. These methods are useful for 

r-™ +as> • . , i . ^ t_ ■ j , t r 25 the second and third closed -loop architectures so that opera- 

FIO 14C is a block diagram of a third method for a group ^ ^ ^ U j Wsamplillg can ^ performed properly and 

of blocks processing according to the invention; withom additioQal ooayet ^ m step! 5 

nG. 15A illustrates a prior art concept of down-sampling Applications for Reduced Spatial Resolution Transcoding 
in the DCT or spatial domain; The primary target application for the present invention is 

FIG. 15B is a block diagram of prior art up-sampling in 30 the distribution of digital television (DTV) broadcast and 

the DCT or spatial domain; Internet content to devices with low-resolution displays, 

FIG. 15C is a block diagram of up-sampling in the DCT such as wireless telephones, pagers, and personal digital 

domain according to the invention; and assistance. MPEG-2 is currently used as the compression 

FIG. 16 is a diagram of up-sampling in the DCT domain format for DTV broadcast and DVD recording, and 

according to the invention. * ^ 35 MPEG-1 content is available over the Internet. 

Because MPEG-4 has been adopted as the compression 

DETAILED DESCRIPTION OF PREFERRED format for video transmission over mobile networks, the 

EMBODIMENTS present invention deals with methods for transcoding 

Introduction MPEG-1/2 content to lower resolution MPEG-4 content. 

The invention provides a system and method for transcod- 40 FIG. 7 shows a first example of a multimedia content 

ing compressed bitstreams of digital video signals to a distribution system 700 that uses the invention. The system 

reduced spatial resolution with minimum drift. First, several 700 includes an adaptive server 701 connected to clients 702 

applications for content distribution that can use the via an external network 703. As a characteristics the clients 

transcoder according to the invention are described. Next, an have small-sized displays or are connected by low bit-rate 

analysis of a basic method for generating a bitstream at a 45 channels. Therefore, there is a need to reduce the resolution 

lower spatial resolution is provided. Based on this analysis, of any content distributed to the clients 702. 
several alternatives to the base method and the correspond- Input source multimedia content 704 is stored in a data- 

ing architectures that are associated with each alternative are base 710. The content is subject to a feature extraction and 

described. an indexing process 720. A database server 740 allows the 

A first alternative, see FIG. 9, uses an open-loop 50 clients 702 to browse the content of the database 710 and to 

architecture, while the other three alternatives, FIGS. 10 and make requests for specific content. A search engine 730 can 

lla-by correspond to closed-loop architectures that provide be used to locate multimedia content. After the desired 

a means of compensating drift incurred by down-sampling, content has been located, the database server 740 forwards 

re-quantization and motion vector truncation. One of the the multimedia content to a transcoder 750 according to the 

closed-loop architectures performs this compensation in the 55 invention. 

reduced resolution, while the others perform this compen- The transcoder 750 reads network and client characteris- 
sation in the original resolution in the DCT domain for better tics. If the spatial resolution of the content is higher than the 
quality. display characteristics of the client, then the method accord- 
As will be described in greater detail below, the open-loop ing to the invention is used to reduce the resolution of the 
architecture of FIG. 9 is of low complexity. There is no 60 content to match the display characteristics of the client, 
reconstruction loop, no DCT/IDCT blocks, no frame store, Also, if the bit-rate on the network channel is less than the 
and the quality is reasonable for low picture resolution, and bit- rate of the content, the invention can also be used, 
bit-rates. This architecture is suitable for Internet applica- FIG. 8 shows a second example of a content distribution 
tions and software implementations. The first closed-loop system 800. The system 800 includes a local "home" net- 
architecture of FIG. 10 is also of moderate complexity. It 65 work 801, the external network 703, a broadcast network 
includes a reconstruction loop, IDCT/DCT blocks, and a 803, and the adaptive server 701 as described for FIG. 7. In 
frame store. Here, the quality can be improved with drift this application, high-quality input source content 804 can 
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be transported to clients 805 connected to the home network approximate. It should be emphasized that the complexity in 

801 via the broadcast network 803, e.g., cable, terrestrial or generating this reference signal is high and is desired to 

satellite. The content is received by a set-top box or gateway approximate the quality, while achieving significant com- 

820 and stored into a local memory or hard-disk drive plexity reduction. 

(HDD) 830. The received content can be distributed to the 5 Open-Loop Architecture 

clients 805 within the home. In addition, the content can be Give the approximations, 
transcoded 850 to accommodate any clients that do not have 

the capability to decode/display the full resolution content. e *«- 1 ( 7a ) 

This can be the case when a high-definition television nn , t wri/ m * /7l v 
(HDTV) bitstream is received for a standard-definition tele- 10 

vision set. Therefore, the content should be transcoded to the reduced resolution residual signal in equation (6) is 

satisfy client capabilities within the home. expressed as, 

Moreover, if access to the content stored on the HDD 830 

is desired by a low-resolution external client 806 via the g„ 2 -»X)(e„ 1 ). (8) 

external network 802, then the transcoder 850 can also be 15 m 

used to deliver low-resolution multimedia content to this , ™ e above equation suggests the open-loop architecture 

c ^ ent for a transcoder 900 as shown in FIG, 9. 

Analysis of Base Method In the transcoder 900 » lne ^coming bitstream 901 signal 

In order to design a transcoder with varying complexity js variable length decoded 910 to generate inverse quantized 

and quality, the signals generated by the method of FIG. 4 20 DCT ft ^E_ Cie ° t f 1 9U \ a ° d Ml resohmon motlon vec <°*> 

are further described and analyzed. With regard to notation »V 9 °f ^ M-resolution motion vectors are mapped by 

in the equations, lowercase variables indicate spatial domain ^ ^Yj**^ 920 ^ 'educed-re^>lution motion vectors, 

signals, while uppercase variables represent the equivalent mv - 903 ' The quantized DCT coefficients 911 are inverse 

signal in the DCT domain. The subscripts on the variables Q^ntacd, with quantizer Q x 930, to yield signal E„> 931. 

indicates time, while a superscript equal to one denotes a 25 ™* Slg ° al » ^ sub ^ ect * a ?? u f of * ocks P roce f° r 

signal that has drift and a superscript equal to two denotes 1300 35 d ^*? d f S reater ^elow. The output of the 

a signal that is drift free. The drift is introduced through Processor 1300 is down-sampled 950 to produce signal G\ 

lossy processes, such as re-quantization, motion vector 951 MiCT i° w ™* m V± m Z> ih * s *&* 1 » ™th 

truncation or down-sampling. A method for drift compen- Quantizer ^2 960. Finally, the reduced resolution 

sation is described below. 30 ^-quantized DCT coefficients and motion vectors are van- 

I -frames able len S tD coded 970 and written to the transcoded output 

Because there is no motion compensated prediction for bitstream 902. 

I -frames i e ^ e details and preferred embodiments of the group of 

blocks processor 1300 are described below, but briefly, the 

x„ l -e„\ (3) 35 purpose of the group of blocks processor is to pre-process 

selected groups of macroblocks to ensure that the down- 

the signal is down-sampled 410, sampling process 950 will not generate groups of macrob- 



y^M- (2) 



locks in which its sub-blocks have different coding modes, 
e.g., both inter-and intra-blocks. Mixed coding modes within 



Then, in the encoder 120, 40 a macroblock are not supported by any known video coding 

standards. 

g n 2 «y„\ (3) Drift Compensation in Reduced Resolution 

Given only the approximation given by equation (7b), the 

The signal g„ 2 is subject to the DCT 440, then quantized reduced resolution residual signal in equation (6) is 

450 with quantization parameter Q 2 . The quantized signal expressed as, 
c ouf is variable length coded 460 and written to the 

transcoded bitstream 402. As part of the motion compensa- 8n 3m ZHfi* l frM l (y H _ l 1 -y H _ l *) (9) 

tion loop in the encoder, c out is inverse quantized 470 and ^ , 4 . . . , . . . 

subject to the IDCT 480. Tte reduced resolution reference ^tM.Tf closed-loop architecture 

signal y„ 2 481 is stored into the frame buffer 490 as the c 1000 shown in FIG. 10, which compensates for drift in the 

reference signal for ftiture frame predictions. 50 reso ^ l0n - . ^ . . 

P-frames architecture, the incoming signal 1001 is variable 

r .< ~ n r #u a ♦ length decoded 1010 to yield quantized DCT coefficients 

In the case or P-frames, the identity . * , . „ . . J . M 4 „ 

1011 and full resolution motion vectors mv^ 1012. The 

x^-e*+M{x„_f) (4) full-resolution motion vectors 1012 are mapped by the MV 

55 mapping 1020 to yield a set of reduced-resolution motion 

yields the reconstructed full-resolution picture. As with the vectors, mv r 1021. The quantized DCT coefficients are 

I-frame, this signal is then down-converted via equation (2). inverse quantized 1030, with quantizer Q 1 to yield signal E^ 1 

Then, the reduced-resolution residual is generated according 1031. This signal is then subject to the group of blocks 

to processor 1300 and down-sampled 1050. After down- 

2 t 60 sampling 1050, a reduced-resolution drift-compensating sig- 

8n -y n -K(y»-n> (5) na j 1051 f s ac jded 1060 to the low-resolution residual 1052 



which is equivalently expressed as, 



in the DCT domain. 
The signal 1061 is quantized with spatial quantizer Q 2 
gS-DieSy+Dix^yMrfy^ 2 ). (6) MHO. Finally, the reduced resolution re-quantized DCT 

65 coefficients 1071 and motion vectors 1021 are variable 
The signal given by equation (6) represents the reference length coded 1080 to generate the output transcoded bit- 
signal that the architectures described by this invention stream 1002. 
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The reference frame from which the reduced-resolution Both drift compensating architectures in the original 

drift-compensating signal is generated is obtained by an resolution do not use the motion vector approximations in 

inverse quantization 1090 of the re-quantizer residual G 2 generating the drift compensating signal 1151. This is 

1071, which is then subtracted 1092 from the down-sampled accomplished by the use of up-sampling 1191. The two 

residual G„ 2 1052. This difference signal is subject to the 5 alternative architectures mainly differ in the choice of sig- 

IDCT 1094 and added 1095 to the low-resolution predictive nals that are used to generate the difference signal. In the first 

component 1096 of the previous macroblock stored in the method, the difference signal represents error due to 

frame store 1091. This new signal represents the difference re-quantization and resolution conversion, while the differ- 

(y„.i 1 y w . 1 2 ) 1097 and is used as the reference for low- ence signal in the second method only considers the error 

resolution motion compensation for the current block. 10 due to re -quantization. 

Given the stored reference signal, low-resolution motion Because the up-sampled signal is not considered in the 
compensation 1098 is performed and the prediction is sub- future decoding of the transcoded bitstream, it is reasonable 
ject to the DCT 1099. This DCT-domain signal is the to exclude any error measured by consecutive down- 
reduced-resolution drift-compensating signal 1051. This sampling and up-sampling in the drift compensation signal, 
operation is performed on a macroblock-by-macroblock 15 However, up-sampling is still employed for two reasons: to 
basis using the set of low-resolution motion vectors, mv r make use of the full-resolution motion vectors 1121 to avoid 
1021. any further approximation, and so that the drift compensat- 
First Method of Drift Compensation in Original Resolution ing signal is in the original resolution and can be added 1160 

For an approximation, to the incoming residual 1161 before down-sampling 1150. 

20 Mixed Block Processor 

K<y.-^(Mfu(y H _?)))^{M{x n _*)\ (io) Thc p Urposc of mc group of blocks processor 1300 is to 

a a i -a i i ■ * - /*\ - pre-process selected macrob locks to ensure that the down- 

the reduced resolution residual signal in equation (6) is r a * * li i • i_- t_ 

- & ^ v ' sampling process do not generate macroblocks in which its 

sub-blocks have different coding modes, e.g., inter- and 

g^-Dic^M/p^S-x^ 2 ). (11) 25 mtra-blocks. Mixed coding modes within macroblocks are 

not supported by any known video coding standards. 

The above equation suggests the closed-loop architecture FIG. 12 shows an example of a group of macroblocks 

1100 shown in FIG. 11, which compensates for drift in the 1201 that can lead to a group of blocks 1202 in the reduced 

original resolution bitstream. resolution after transcoding 1203. Here, there are three 

In this architecture, the incoming signal 1001 is variable 30 inter-mode blocks, and one intra-mode block. Note, the 

length decoded 1U0 to yield quantized DCT coefficients motion vector (MV) for the intra-mode block is zero. 

1111, and full resolution motion vectors, mv ; 1112. The Determining whether a particular group of blocks is a mixed 

quantized DCT coefficients 1111 are inverse quantized 1130, group, or not, depends only on the macroblock mode. The 

with quantizer Q 19 to yield signal E„ 1 1131. This signal is group of blocks processor 1300 considers groups of four 

then subject to the group of blocks processor 1300. After 35 macroblocks 1201 mat form a single macroblock 1202 in the 

group of blocks processing 1300, an original-resolution reduced resolution. In other words, for the luminance 

drift-compensating signal 1151 is added 1160 to the residual component, MB(0) 1210 corresponds to sub-block b(0) 

1141 in the DCT domain. The signal 1162 is then down- 1220 in the reduced resolution macroblock 1202, and 

sampled 1150, and quantized 1170 with quantizer Q 2 . similarly, MB(1) 1211 will correspond to b(l) 1221, MB(k) 

Finally, the reduced resolution re-quantized DCT coeffi- 40 1212 corresponds to b(2) 1222, and MB(k+l) 1213 corre- 

cients 1171, and motion vectors 1121 are variable length sponds to b(3) 1223, where k is the number of macroblocks 

coded 1180, and written to the transcoded bitstream 1102. per row in the original resolution. Chrominance components 

The reference frame from which the original-resolution are handled in a similar manner that is consistent with 

drift-compensating signal 1151 is generated by an inverse luminance modes. 

quantization 1190 of the re -quantizer residual G„ 2 1171, 45 A group of MB modes determine whether the group of 

which is then up-sampled 1191. Here, after the up-sampling blocks processor 1300 should process a particular MB. The 

the up-sampled signal is subtracted 1192 from the original group of blocks is processed if the group contains at least 

resolution residual 1161. This difference signal is subject to one intra-mode block, and at least one inter-mode block, 

the IDCT 1194, and added 1195 to the original-resolution After a macroblock is selected, its DCT coefficients and 

predictive component 1196 of the previous macroblock. 50 motion vector data are subject to modification. 

This new signal represents the difference (x^-x^ 2 ) FIG. 1300 shows the components of the group of blocks 

1197, and is used as the reference for motion compensation processor 1300. For a selected group of mixed blocks 1301, 

of the current macroblock in the original resolution. the group of blocks processor performs mode mapping 

Given the reference signal stored in the frame buffer 1181, 1310, motion vector modification 1320, and DCT coefficient 

original-resolution motion compensation 1198 is performed, 55 modification 1330 to produce an output non-mixed block 

and the prediction is subject to the DCT 1199. This DCT- 1302. Given that the group of blocks 1301 has been 

domain signal is the original-resolution drift-compensating identified, the modes of the macroblocks are modified so that 

signal 1151. This operation is performed on a macroblock- all macroblocks are identical. This is done according to a 

by-macroblock basis using the set of original-resolution pre -specified strategy to match the modes of each sub-block 

motion vectors, mv / U21. 60 in a reduced resolution block. 

Second Method of Drift Compensation in Original Resolu- In accordance with the chosen mode mapping, the MV 

tion data are then subject to modification 1320. Possible modi- 

FIG. lib shows an alternative embodiment of the closed fications that agree with corresponding mode mappings are 

loop architecture of FIG. 11a. Here, the output of the inverse described in detail below for FIGS. 14A— C. Finally, given 

quantization 1190 of the re-quantizer residual G„ 2 1172 is 65 both the new MB mode and the MV data, the corresponding 

subtracted 1192 from the reduced resolution signal before DCT coefficients are also modified 1330 to agree with the 

up-sampling 1191. mapping. 
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In a first embodiment of the group of blocks processor as 
shown in FIG. 14A, the MB modes of the group of blocks 
1301 are modified to be inter-mode by the mode mapping 
1310. Therefore, the MV data for the intra-blocks are reset 
to zero by the motion vector processing, and the DCT 5 
coefficients corresponding to intra-blocks are also reset to 
zero by the DCT processing 1330. In this way, such sub- 
blocks that have been converted are replicated with data 
from the corresponding block in the reference frame. 

In a second embodiment of the group of blocks processor 10 
as shown in FIG. 14B, the MB modes of the groups of mixed 
block are modified to be to inter-mode by the mapping 1310. 
However, in contrast to the first preferred embodiment, the 
MV data for intra-MB's are predicted. The prediction is 
based on the data in neighboring blocks, which can include 15 
both texture and motion data. Based on this predicted motion 
vector, a new residual for the modified block is calculated. 
The final step 1320 resets the inter-DCT coefficients to 
intra-DCT coefficients. 

In a third embodiment shown in FIG. 14C, the MB modes 20 
of the grouped of blocks are modified 1310 to intra-mode. In 
this case, there is no motion information associated with the 
reduced-resolution macroblock, therefore all associated 
motion vector data are reset 1320 to zero. This is necessary 
to perform in the transcoder because the motion vectors of 25 
neighboring blocks are predicted from the motion of this 
block. To ensure proper reconstruction in the decoder, the 
M V data for the group of blocks must be reset to zero in the 
transcoder. The final step 1330 generates intra-DCT coeffi- 
cients to replace the inter-DCT coefficients, as above. 30 

It should be noted that to implement the second and third 
embodiments described above, a decoding loop that recon- 
structs to full-resolution can be used. This reconstructed data 
can be used as a reference to convert the DCT coefficients 
between intra- and inter-modes, or inter- and intra-modes. 35 
However, the use of such a decoding loop is not required. 
Other implementations can perform the conversions within 
the drift compensating loops. 

For a sequence of frames with a small amount of motion, 
and a low-level of detail the low complexity strategy of FIG. 40 
14Acan be used. Otherwise, the equally complex strategies 
of either FIG. 14b or FIG. 14c should be used. The strategy 
of FIG. 14c provides the best quality. 
Drift Compensation with Block Processing 

It should be noted that the group of block processor 1300 45 
can also be used to control or minimize drift. Because intra 
coded blocks are not subject to drift, the conversion of 
inter-coded blocks to intra-coded blocks lessens the impact 
of drift. 

As a first step 1350 of FIG. 14C, the amount of drift in the 50 
compressed bitstream is measured. In the closed-loop 
architectures, the drift can be measured according to the 
energy of the difference signal generated by 1092 and 1192 
or the drift compensating signal stored in 1091 and 1191. 
Computing the energy of a signal is a well-known method. 55 
The energy that is computed accounts for various 
approximations, including re-quantization, down -sampling 
and motion vector truncation. 

Another method for computing the drift, which is also 
applicable to open-loop architectures, estimates the error 60 
incurred by truncated motion vectors. It is known that 
half-pixel motion vectors in the original resolution lead to 
large reconstruction errors when the resolution is reduced. 
Full-pixel motion vectors are not subject to such errors 
because they can still be mapped correctly to half-pixel 65 
locations. Given this, one possibility to measure the drift is 
to record the percentage of half-pixel motion vectors. 
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However, because the impact of the motion vector approxi- 
mation depends on the complexity of the content, another 
possibility is that the measured drift be a function of the 
residual components that are associated with blocks having 
half-pixel motion vectors. 

The methods that use the energy of the difference signal 
and motion vector data to measure drift can be used in 
combination, and can also be considered over sub-regions in 
the frame. Considering sub-regions in the frame is advan- 
tageous because the location of macroblocks that benefit 
most by drift compensation method can be identified. To use 
these methods in combination, the drift is measured by the 
energy of the difference signal, or drift compensating signal 
for macroblocks having half-pixel motion vectors in the 
original resolution. 

As a second step, the measured value of drift is translated 
into an "intra refresh rate" 1351 that is used as input to the 
group of blocks processor 1300. Controlling the percentage 
of intra-coded blocks has been considered in the prior art for 
encoding of video for error-resilient transmission, see for 
example "Analysis of Video Transmission over Lossy 
Channels," Journal of Selected Areas of Communications, 
by Stuhlmuller, et al, 2000. In that work, a back-channel 
from the receiver to the encoder is assumed to communicate 
the amount of loss incurred by the transmission channel, and 
the encoding of intra-coded blocks is performed directly 
from the source to prevent error propagation due to lost data 
in a predictive coding scheme. 

In contrast, the invention generates new intra-blocks in 
the compressed domain for an already encoded video, and 
the conversion from inter- to intra-mode is accomplished by 
the group of blocks processor 1300. 

If the drift exceeds a threshold amount of drift, the group 
of blocks processor 1300 of FIG. 14c is invoked to convert 
an inter-mode block to an intra-mode block. In this case, the 
conversion is be performed at a fixed and pre -specified intra 
refresh rate. Alternatively, conversion can be done at an intra 
refresh rate that is proportional to the amount of drift 
measured. Also, rate-distortion characteristics of the signal 
can be taken into account to make appropriate trade-offs 
between the intra refresh rate and quantizers used for coding 
intra and inter blocks. 

It should be noted that the invention generates new 
intra-blocks in the compressed domain, and this form of drift 
compensation can be performed in any transcoder with or 
without resolution reduction. 
Down-Sampling 

Any down-sampling method can be used by the 
transcoder according to the invention. However, the pre- 
ferred down -sampling method is according to U.S. Pat. No. 
5,855,151, "Method and apparatus for down-converting a 
digital signal," issued on Nov 10, 1998 to Sun et al, 
incorporated herein by reference. 

The concept of this down-sampling method is shown in 
FIG. 15A. A group includes four 2^x2" DCT blocks 1501. 
That is, the size of the group is 2^ +1 x2^ +1 . A "frequency 
synthesis" or filtering 1510 is applied to the group of blocks 
to generate a single 2^x2" DCT block 1511. From this 
synthesized block, a down-sampled DCT block 1512 can be 
extracted. 

This operation has been described for the DCT domain 
using 2D operations, but the operations can also be per- 
formed using separable ID filters. Also, the operations can 
be completely performed in the spatial domain. Equivalent 
spatial domain filters can be derived using the methods 
described in U.S. patent application Ser. No. 09/035,969, 
"Three layer scalable decoder and method of decoding," 
filed on Mar. 6, 1998 by Vetro et al, incorporated herein by 
reference. 



3/11/05, EAST Version: 2.0.1.4 



US 6,671,322 B2 
13 14 

The main advantage of using the down-sampling method Cosine Transform: Algorithms, Advantages and 
in the transcoder according to the invention is that correct Applications," Academic, Boston, 1990. For convenience, 
dimension of sub-blocks in the macroblock are obtained the expressions are also given below, 
directly, e.g., from four 8x8 DCT blocks, a single 8x8 block 
can be formed. On the other hand, alternate prior art methods 5 

for down-sampling produce down-sampled data in a dimen- The DCT definition is 
sion that does not equal the required dimension of the 

outgoing sub-block of a macroblock, e.g., from four 8x8 , — N _ { {13) 
DCT blocks, a four 4x4 DCT blocks is obtained. Then, an c<1 = ^J - V c.cosf 02 '* 1)yr ), and 

additional step is needed to compose a single 8x8 DCT 10 ' N ;=o * w ' 

block. 

The above filters are useful components to efficiently 
implement the architecture shown in FIG. 11 that requires the IDCT definition is 
up-sampling. More generally, the filters derived here can be 

applied to any system that requires arithmetic operations on 15 FT N ~ l ({2j + l)qn\ ^ 

up-sampled DCT data, with or without resolution reduction cj = J jj £ ^Qcos^ — — — j, 

or drift compensation. 

Up-Sampling where 

Any means of prior art up-sampling can be used in the 
present invention. However, Vetro, et al., in U.S. patent 20 
application "Three layer scalable decoder and method of 
decoding," see above, states that the optimal up-sampling 
method is dependent on the method of down-sampling. 
Therefore, the use an up-sampling filters x u mat corresponds 

to the down-sampling filters x d is preferred, where the 25 Given the above, block E 1610 represents the up-sampled 
relation between the two filters is given by, DCT block based on filtering C with X„ 1611, and e 

"x/ixjcJY 1 (12) represents the up-sampled spatial domain block-based on 



1 n (15) 

VT* = 0 . 

1; q *0 



filtering c with the x u 1621 given by equation (12). Note that 

There are two problems associated with the filters derived e and E arc rclated lhrough a 2N . p t DCT/IDCT 1630. The 

from the above equations. First, the filters are only appli- 30 • «„^„^ llt ra }^: n „ c „f ,u n c\u art >A ^ „■ ro „ K „ 

. . . - ' n^rrAu mput -output relations ol the nltered input are given by, 
cable m the spatial domain filters because the DCT filters are 

not invertable. But, this is a minor problem because the W J 

corresponding spatial domain filters can be derived, then Ek _ y c x M {k, qy, o zk s IN - 1, and 

converted to the DCT- domain. £o ? 

However, the second problem is that the up-sampUng 35 

filters obtained in this way correspond to the process shown p . (16b) 

in FIG. 15B. In this process, for example, an 2^x2* block * = £ CyJC " 1 '* " ' * 
1502 is up-sampled 1520 to a single 2 /+1 x2 Ar+I block 1530. 
If up-sampling is performed entirely in the spatial domain, 

there ^ no problem. However if the ^p-samplkgi js per- 40 M shown ^ FIG 16 the desired DCT blocks ^ denoled 



formed in the DCT domain, one has a 2 A ^ 1 x2 Ar * 1 DCT block 
to deal with, i.e., with one DC component. This is not 



by A 1611 and B 1612. The aim of this derivation is to derive 



suitable for operations that require the up-sampled DCT filterS X " 1641 and X <=* 1642 can * t0 com P ute 

block to be in standard MB format, i.e., four 2"x2" DCT A and B directl y from C > respectively, 

blocks, where N is 4. That is, the up-sampled blocks have the 45 As the first step, equation (14) is substituted into equation 

same format or dimensionality as the original blocks, there ( 16b ), resulting expression is the spatial domain output 

jU ^^ r !ll h l m ; f „™„.. :„ o^r Sm e as * of the DCT input C, which is given by, 



The above method of up-sampling in the DCT domain is 
not suitable for use in the transcoder described in this 



(17) 



invention. In FIG. 11a, up-sampled DCT data are subtracted 50 YV f /i~ V f g/j-lj^ V 

from DCT data output from the mixed block processor 1300. ""Zi^Y N ^Zj x "^j> ' eo \ 2N J 

The two DCT data of the two blocks must have the same 9=0 
format. Therefore, a filter that can perform the up-sampling 
illustrated in FIG. 15C is required. Here, the single 2^x2^ 

block 1502 is up-sampled 1540 to four 2"x2" blocks 1550. 55 To ex P res s A and B in terms of C using equation (17), the 
Because such a filter has not yet been considered and does spatial domain relationship between a, b and e is 
not exist in the known prior art, an expression for the ID case 

is derived in the following. a x e^\ 0S/£JV-l b x ^ t ; N&i&TN-r (18) 

With regard to notation in the following equations, low- 
ercase variables indicate spatial domain signals, while 60 wher e i in the above denotes the spatial domain index. The 
uppercase variables represent the equivalent signal in the DCT domain expression for a is given by, 
DCT domain. F 5 J 

As illustrated in FIG. 16, C 1601 represents the DCT _ 
block to be up-sampled in the DCT domain, and c 1602 Ak=Zi 1 1 y afi J g*+l)far ) 

represents the equivalent block in the spatial domain. The 65 V ^ ^ ' A 2Af / 

two blocks are related to one another through the definition 
of the N-pt DCT and IDCT 1603, see Rao and Yip, "Discrete 
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Using equations (17)-(1 9 ) gi ves > 



which is equivalently expressed as 

N-l 

where 



= 2j C0S 1"" 2F"JZi^ a M^v - } 



Similarly, 



(20) 



(2D io 



(22) 



15 



20 



JV-I 



(23) 



2 Y - ^ f{2i+i)kjT 



IN 



7 >=0 



(2y+l^> 



2N 



25 



which is equivalently expressed as 



30 



N-l 

where 



(24) 



35 



X cb {K q) = 



(25) 



2 V« /(2/+i)fcn^i my+ltyr) 



40 



The above filters can then be used to up-sample a single 
block of a given dimension to a larger number of blocks, 
each having the same dimension as the original block. More 
generally, the filters derived here can be applied to any 
system that requires arithmetic operations on up-sampled 
DCTdata. 

To implement the filters given by equations (22) and (25), 
it is noted that each expression provides a kxq matrix of 
filter taps, where k is the index of an output pixel and q is 
the index of an input pixel. For ID data, the output pixels are 
computed as a matrix multiplication. For 2D data, two steps 
are taken. First, the data is up-sampled in a first direction, 
e.g., horizontally. Then, the horizontally up -sampled data is 
up-sampled in the second direction, e.g., vertically. The 
order of direction for up-sampling can be reversed without 
having any impact on the results. 

For horizontal up-sampling, each row in a block is oper- 
ated on independently and treated as an N-dimensional input 
vector. Each input vector is filtered according to equations 
(21) and (24). Ilie output of this process will be two standard 
DCT blocks. 

For vertical up-sampling, each column is operated on 
independently and again treated as an N-dimensional input 
vector. As with the horizontal up-sampling, each input 
vector is filtered according to equations (21) and (24). The 
output of this process will be four standard DCT blocks as 
shown in FIG. ISC. 



45 



50 
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Syntax Conversion 

As stated for the above applications of the transcoder 
according to the invention, one of the key applications for 
this invention is MPEG-2 to MPEG-4 conversion. Thus far, 
the focus is mainly on the architectures used for drift 
compensation when transcoding to a lower spatial resolution 
and additional techniques that support the conversion to 
lower spatial resolutions. 

However, syntax conversion between standard coding 
schemes is another important issue. Because we believe that 
this has been described by patent applications already 
pending, we do not provide any further details on this part. 

Although the invention has been described by way of 
examples of preferred embodiments, it is to be understood 
that various other adaptations and modifications can be 
made within the spirit and scope of the invention. Therefore, 
it is the object of the appended claims to cover all such 
variations and modifications as come within the true spirit 
and scope of the invention. 

We claim: 

1. A method for transcoding groups of macroblocks of a 
partially decoded input bitstream, the groups including 
intra-mode and inter-mode macroblocks, and each macrob- 
lock including DCT coefficients, and a motion vector, com- 
prising: 

mapping the modes of each group of macroblocks of the 
partially decoded input bitstream to be identical only if 
there is an inter-mode macroblock and an intra-mode 
macroblock in the group, and modifying the DCT 
coefficients and the motion vector in accordance with 
the mapping for each changed macroblock; and 

down-sampling each group of macroblocks to generate 
reduced-resolution macroblock for an output com- 
pressed bit stream. 

2. The method of claim 1 wherein the mode of each 
changed macroblock is mapped to inter-mode, and the 
motion vector and the DOT coefficients of each changed 
macroblock is set to zero if the partially decoded input 
bitstream has a relatively small amount of motion. 

3. The method of claim 1 wherein the mode of each 
changed macroblock is mapped to inter-mode, and the 
motion vector of the changed block is predicted, and the 
DCT coefficients of the changed macroblock are converted 
to inter-mode if the bitstream has a relatively large amount 
of motion. 

4. The method of claim 1 wherein the mode of each 
changed macroblock is mapped to intra-mode, and the 
motion vector of the changed macroblock is set to zero, and 
the DOT coefficients of the changed macroblock are con- 
verted to intra-mode if the bitstream has a relatively large 
amount of motion. 

5. The method of claim 1 wherein the down-sampling 
includes mapping the motion vector to a low-resolution 
motion vector and further comprising: 

variable length decoding a compressed bitstream to gen- 
erate inverse DCT coefficients and the motion vector of 
the partially decoded bit stream; 

inverse quantizing the inverse DCT coefficients using a 
first spatial quantizer to obtain the DOT coefficients; 

quantizing each reduced-resolution macroblock using a 
second spatial quantizer; and 

variable length coding each quantized reduced-resolution 
macroblock and the low-resolution motion vector. 

6. The method of claim 1 further comprising: 
generating a reduced-resolution drift-compensating signal 

for each down-sampled macroblock; and 
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adding the reduced-resolution drift-compensating signal 
to each down-sampled macroblock to compensate for 
drift in the output compressed bitstream. 

7. The method of claim 1 further comprising: 
generating a full-resolution drift compensating signal for 

each down-sampled macroblock; 
adding each full- resolution drift-compensating signal to 
each macroblock of the group. 

8. The method of claim 7 further comprising: 
subtracting an inverse quantized signal and up-sampled 

signal from an original resolution reference signal to 
generate the full-resolution signal. 

9. The method of claim 7 further comprising: 
subtracting an inverse quantized signal from a reduced 

resolution reference signal; and 
up-sampling the reduced resolution difference signal to 
generate the full-resolution signal. 

10. The method of claim 7 further comprising: 
subtracting an inverse quantized and up-sampled signal 

from an original resolution reference signal to generate 
the full-resolution signal. 

11. The method of claim 1 further comprising: 

generating a reduced-resolution difference signal for each 
down-sampled macroblock; 

up-sampling each reduced-resolution difference signal to 
a full-resolution drift-compensation signal; and 

adding each full-resolution drift-compensating signal to 
each macroblock of the group. 

12. The method of claim 1 further comprising: 

generating a full-resolution difference signal for each 
down -sampled macroblock; and 

adding each full-resolution drift-compensating signal to 
each macroblock of the group. 



10 
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13. The method of claim 1 wherein each macroblock 
includes 2^x2^ pixels, and the down-sampling further com- 
prises: 

filtering the group of 2^x2^ macroblocks to generate a 
single 2^x2^ macrobloclc 

14. The method of claim 1 wherein the partially decoded 
input bitstream is in MPEG-2 format, and the compressed 
output bitstream is in MPEG -4 format. 

15. The method of claim 1 wherein the transcoding is 
performed in a adaptive server of a multimedia content 
distribution system. 

16. The method of claim 1 wherein the transcoding is 
performed in a transcoder of a home network. 

17. The method of claim 1 further comprising: 
applying a plurality of DOT filters to the DCT coefficients 

of each macroblock to generate a plurality of 
up-sampled macroblocks for each macroblock, there 
being one up-sampled macroblock generated by each 
filter, and where the macroblock and up-sampled mac- 
roblock has an identical dimensionality. 

18. An apparatus for transcoding groups of macroblocks 
of a partially decoded input bitstream, the groups including 
intra -mode and inter-mode macroblocks, and each macrob- 
lock including DCT coefficients, and a motion vector, com- 
prising: 

means for mapping the modes of each group of macrob- 
locks of the partially decoded input bitstream to be 
identical only if there is an inter-mode macroblock and 
an intra-mode macroblock in the group, and modifying 
the DCT coefficients and the motion vector in accor- 
dance with the mapping for each changed macroblock; 
and 

means for down-sampling each group of macroblocks to 
generate reduced-resolution macroblock for an output 
compressed bitstream. 
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